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Abstract — Optimal prefix codes are studied for pairs of in- 
dependent, integer-valued symbols emitted by a source with a 
geometric probability distribution of parameter 0<q<1. By 
encoding pairs of symbols, it may be possible to reduce the redun- 
dancy penalty of symbol-by-symbol encoding, while preserving 
the simplicity of the encoding and decoding procedures typical of 
Golomb codes and their variants. It is shown that optimal codes 
for these so-called two-dimensional geometric distributions are 
parameter-singular, in the sense that a prefix code that is optimal 
for one value of the parameter q cannot be optimal for any 
other value of q. This is in sharp contrast to the one-dimensional 
case, where codes are optimal for positive-length intervals of the 
parameter q. Thus, in the two-dimensional case, it is infeasible to 
give a compact characterization of optimal codes for all values 
of the parameter q, as was done in the one-dimensional case. 
Instead, optimal codes are characterized for a discrete sequence 
of values of q that provides good coverage of the unit interval. 
Specifically, optimal prefix codes are described for q — 
{k > 1), covering the range ^ > |, and q — {k > 1), covering 
the range q < \, The described codes produce the expected 
reduction in redundancy with respect to the one-dimensional case, 
while maintaining low complexity coding operations. 

Index terms — geometric distributions, prefix codes, Huffman 
codes, Golomb codes, codes for countable alphabets, lossless 
compression 



I. Introduction 

In 1966, Golomb Q described optimal binary prefix codes 
for some geometric distributions over the nonnegative integers, 
namely, distributions with probabilities p{i) of the form 

p{i) = (1 - q)q' , i > 0, 

for some real- valued parameter < < 1. In |2|, these 
Golomb codes were shown to be optimal for all geometric 
distributions. These distributions occur, for example, when 
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encoding run lengths (the original motivation in |T|), and 
in image compression when encoding prediction residuals, 
which are well-modeled by two-sided geometric distributions. 
Optimal codes for the latter were characterized in |3|, based 
on some combinations and variants of Golomb codes. Codes 
based on the Golomb construction have the practical advantage 
of allowing the encoding of a symbol i using a simple explicit 
computation on the integer value of without recourse to 
nontrivial data structures or tables. This has led to their 
adoption in many practical applications (cf. f4l,f5l). 

Symbol-by- symbol encoding, however, can incur significant 
redundancy relative to the entropy of the distribution, even 
when dealing with sequences of independent, identically dis- 
tributed random variables. One way to mitigate this problem, 
while keeping the simplicity and low latency of the encoding 
and decoding operations, is to consider short blocks of d>l 
symbols, and use a prefix code for the blocks. In this paper, 
we study optimal prefix codes for pairs (blocks of length d=2) 
of independent, identically distributed geometric random vari- 
ables, namely, distributions on pairs of nonnegative integers 
(z, j) with probabilities of the form 

P{hj)=p{i)p{j) = (l-qfq'^' ij>0. (1) 

We refer to this distribution as a two-dimensional geometric 
distribution (TDGD), defined on the alphabet of integer pairs 
A = { {i^j) \ i^j > 0}. For succinctness, we denote a TDGD 
of parameter q by TDGD(g). 

Aside from the mentioned practical motivation, the problem 
is of intrinsic combinatorial interest. It was proved in [6} (see 
also |7 |) that, if the entrop>F]- ^^^^ P{a) log P(a) of a dis- 
tribution over a countable alphabet A is finite, optimal codes 
exist and can be obtained, in the limit, from Huffman codes 
for truncated versions of the alphabet. However, the proof does 
not give a general way for effectively constructing optimal 
codes, and in fact, there are few families of distributions 
over countable alphabets for which an effective construction 
is known |8||l9|. An algorithmic approach to building optimal 
codes is presented in |9|, which covers geometric distributions 
and various generalizations. The approach, though, is not 
applicable to TDGDs, as explicitly noted in |9|. 

Some characteristic properties of the families of optimal 
codes for geometric and related distributions in the one- 
dimensional case turn out not to hold in the two-dimensional 
case. Specifically, the optimal codes described in 1 1 1 and O 
correspond to binary trees of bounded width, namely, the 

^\ogx and In a: will denote, respectively, the base-2 and the natural 
logarithm of x. 
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number of codewords of any given length is upper-bounded 
by a quantity that depends only on the code parameters. 
Also, the family of optimal codes in each case partitions the 
parameter space into regions of positive volume, such that all 
the corresponding distributions in a region admit the same 
optimal code. These properties do not hold in the case of 
optimal codes for TDGDs. In particular, optimal codes for 
TDGDs turn out to be parameter-singular, in the sense that 
if a code Tq is optimal for TDGD{q), then Tq is not optimal 
for TDGD(g') for any parameter value ^ q. This result 
is presented in Section 



III 



(A related but somewhat dual 
problem, namely, counting the number of distinct trees that 
can be optimal for a given source over a countable alphabet, 
is studied in |10|.) 

An important consequence of this singularity is that any 
set containing optimal codes for all values of q must be 
uncountable, and, thus, it would be infeasible to give a 
compact characterization of such a set, as was done in 1 1 1 or 
O for one-dimensional cases |^ Thus, from a practical point of 
view, the best we can expect is to characterize optimal codes 
for countable sequences of parameter values. In this paper, we 
present such a characterization, for a sequence of parameter 
values that provides good coverage of the range of 0<g'<l. 
Specifically, in Section |IV| we describe the construction of 
optimal codes for TDGD(g) with q = for integers 

k > ij^ covering the range q > \, and in Section [v] we do 
so for TDGD(g) with q = for integers k > 1, covering 
the range q < \ (thus, overall, we show optimal codes for 
all values of q such that —\ogq is either an integer or the 
inverse of one). In the case q < ^, we observe that, as A: ^ oo 
0), the optimal codes described converge to a limit code, 
in the sense that the codeword for any given pair (a, 6) remains 
the same for all k > ko{a^ b), where ko is a threshold that can 
be computed from a and b (this limit code is also mentioned, 
without proofs, in fTP|). The codes in both constructions are of 
unbounded width. However, they are regular 1 12], in the sense 
that the corresponding infinite trees have only a finite number 
of non-isomorphic whole subtrees (i.e., subtrees consisting of 
a node and all of its descendants). This allows for deriving 
recursions and explicit expressions for the average code length, 
as well as feasible encoding/decoding procedures. Notice that, 
to the best of our knowledge, the only case for which an 
optimal code for a TDGD had been characterized prior to 
this work was the trivial case q = ^, in which case encoding 
each component of (z, j) separately with a unary code (i.e., a 
Golomb code of order one) has zero redundancy, and is thus 
optimal (cf. also |11|). 

Practical considerations, and the redundancy of the new 
codes, are discussed in Section |Vl| where we present redun- 
dancy plots and comparisons with symbol-by- symbol Golomb 
coding and with the optimal code for a TDGD for each plotted 
value of q (optimal average code lengths for arbitrary values 
of q were estimated numerically to sufficiently high preci- 

^ Loosely, by a compact characterization we mean one in which each code 
is characterized by a finite number of finite parameters, which drive the 
corresponding encoding/decoding procedures. 

^ These are the same distributions for which optimality of Golomb codes 
was originally established in (J. 



sion). We also derive an exact expression for the asymptotic 
oscillatory behavior of the redundancy of the new codes as 
q ^ 1. The study confirms the redundancy gains over symbol- 
by- symbol encoding with Golomb codes, and the fact that 
the discrete sequence of codes presented provides a good 
approximation to the full class of optimal codes over the range 
of the parameter q. 

Our constructions and proofs of optimality rely on the 
technique of Gallager and Van Voorhis |2|, which was also 
used in |3 |. As noted in O, most of the work and ingenuity 
in applying the technique goes into discovering appropriate 
"guesses" of the basic components on which the construction 
iterates, and in describing the structure of the resulting codes. 
With the correct guesses, the proofs are straightforward. The 
technique of |2| is reviewed in Section where we also 
introduce some definitions and notation that will be useful 
throughout the paper. 

II. Preliminaries 

A. Definitions 

We are interested in encoding the alphabet A of integer 
pairs (i, j), i^j > 0, using a binary prefix code C (we will 
refer to C plainly as a code, the binary and prefix properties 
assumed throughout). As usual, we associate C with a rooted 
(infinite) binary tree, whose leaves correspond, bijectively, to 
symbols in A, and where each branch is labeled with a binary 
digit. The binary codeword assigned to a symbol is "read off" 
the labels on the path from the root to the corresponding leaf. 
The depth of a node x in a tree T, denoted depth^(x), is 
the number of branches on the path from the root to x. By 
extension, the depth (or height) of a finite tree is defined as 
the maximal depth of any of its nodes. A level of T is the 
set of all nodes at a given depth i (we refer to this set as 
level £). Let nj denote the number of leaves in level i of T 
(we will sometimes omit the superscript T when clear from 
the context). We refer to the sequence {nJ}£>o as the profile 
of T. Two trees will be considered equivalent if their profiles 
are identical. Thus, for a code C, we are only interested in 
its tree profile, or, equivalently, the length distribution of its 
codewords. Given the profile of a tree, and an ordering of 
A in decreasing probability order, it is always possible to 
define a canonical tree (say, by assigning leaves in alphabetical 
order; see, e.g., |[T3l ) that uniquely defines a code for A. The 
notion of tree equivalence adopted implies that given a tree, 
we can arbitrarily permute the nodes at any level, since such 
a permutation leaves the profile invariant. This will allow us 
to make, without loss of generality, certain assumptions on 
the structure of the tree. In particular, we will often make the 
assumption that if a tree contains, say, at least 2^ leaves at a 
certain level i, then there is a set of 2^ leaves at level i that 
have a common ancestoi^ u at level i — j (an alphabetically 
ordered tree, in fact, always has this property). 

With a slight abuse of terminology, we will not distinguish 
between a code and its corresponding tree (or profile), and will 

^We use the usual "family" terminology for trees: nodes have children, 
parents, ancestors and descendants. We also use the common convention of 
visualizing trees with the root at the top and leaves at the bottom. Thus, 
ancestors are "up," and descendants are "down." 
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refer to the same object sometimes as a tree and sometimes 
as a code. Unless noted otherwise, all trees considered in this 
paper are full, i.e., every node in the tree is either a leaf or 
the parent of two children (full trees are sometimes referred to 
in the literature as complete). A tree is balanced (or uniform) 
if it has 2^ leaves, all of them at depth for some /c > 0. 
We denote such a tree by Uk- We will restrict the use of the 
term subtree to refer to whole subtrees of T, i.e., subtrees that 
consist of a node and all of its descendants in T. 

We call j) = j the signature of (i, j) G A. For a 
given value s = s{i^j), there are 5+1 pairs with signature 
5, all with the same probability, P{s)={l — q)'^q^, under the 
distribution ([T]). Given a code C, symbols of the same signature 
can be freely permuted without affecting the properties of 
interest to us (e.g., average code length). Thus, for simplicity, 
we can also regard the correspondence between leaves and 
symbols as one between leaves and elements of the multiset 



denoted /t, is the maximum difference between the depths 
of any two leaves of T. Quasi-uniform trees T have fr ^ 
while uniform trees have /t = 0. In Section llV| we present a 



i={0, 1,1,2,2,2,. ..,5,. ..,5,...}. 



(2) 



s+l times 



In constructing the tree, we do not distinguish between 
different occurrences of a signature s; for actual encoding, 
the s+l leaves labeled with s are mapped to the symbols 
(0, 5), (1, s— 1), . . . , (5, 0) in some fixed order. In the sequel, 
we will often ignore normalization factors for the signature 
probabilities P{s) (in cases where normalization is inconse- 
quential), and will use instead weights w{s) = . 

Consider a tree (or code) T for A. Let be a subtree of 
T, and let s{x) denote the signature associated with a leaf x 
of U. Let F{U) denote the set of leaves of U, referred to as 
its fringe. We define the weight, Wq{U), of U as 

E 

x<£F(U) 



and the cost, Cq{U), of U as 

C,{U)= Yl depths; (x)g^(-) 

xeF{u) 

(the subscript q may be omitted when clear from the context). 
When = T, we have Wq{T) = (1 - q)'^, and Cq{T) = 
(1 - qYCq(T) is the average code length of T. A tree T is 
optimal for TDGD((7) if Cq{T) < Cq(T') for any tree T' . 

B. Some basic objects and operations 

For a > 1, we say that a finite source with probabilities 
Pi ^ P2 ^ • • • ^ Pn, TV > 2, is a-uniform if pi/pN ^ 
A 2-uniform source is also called quasi-uniform. An optimal 
code for a quasi-uniform source on N symbols consists of 
2nogivi_7V codewords of length [\ogN\, and 2N-2^^''^^^ 
codewords of length [log A/"], the shorter codewords corre- 
sponding to the more probable symbols (21 . We refer to such 
a code (or the associated tree) also as quasi-uniform, denote 
it by Qat, and denote by Qnii) the codeword it assigns to 
the symbol associated with pi, l<i<N. For convenience, we 
define Qi as a null code, which assigns code length zero to 
the single symbol in the alphabet. Clearly, for integers k >0, 
we have = Uk. The fringe thickness of a finite tree T, 



characterization of optimal codes of fringe thickness two for 
4-uniform distributions, which generalizes the quasi-uniform 
case. This generalization will help in the characterization of 
the optimal codes for TDGD((7), q = 2"^/^. 

The concatenation of two trees T and U, denoted T -U, is 
obtained by attaching a copy of U to each leaf of T. Regarded 
as a code, T • U consists of all the possible concatenations t • u 
of a word t e T with one u e U. The Golomb code of order 
k > 1 1 1|, denoted Gk, encodes an integer i by concatenating 
Qk{i mod k) with a unary encoding of [i/k] (e.g., [i/k] ones 
followed by a zero). The first-order Golomb code Gi is just 
the unary code, whose corresponding tree consists of a root 
with one leaf child on the branch labeled '0', and, recursively, 
a copy of Gi attached to the child on the branch labeled ' 1 ' . 
Thus, we have Gk = Qk ' Gi- 

C. The Gallager-Van Voorhis method 

When proving optimality of infinite codes for TDGDs, we 
will rely on the method due to Gallager and Van Voorhis (21, 
which is briefly outlined below, adapted to our setting and 
terminology. 

• Define a sequence of finite reduced sources {St)'^Q. 
The alphabet of the reduced source St is a multiset 
St = Ht ^ J^t, where Ht is a multiset comprising the 
signatures 0, 1, . . . , 5— 1 (with multiplicities as in ([2])), 
and Tt consists of a finite number of (possibly infinite) 
subsets of A, referred to as virtual symbols, which form 
a partition of the remaining signatures. We naturally 
associate with each virtual symbol a weight equal to the 
sum of the weights of the signatures it contains. 

• Verify that the sequence {St)^Q is compatible with the 
bottom-up Huffman procedure. This means that after a 
number of merging steps of the Huffman algorithm on 
the reduced source St, one gets St-i. Proceed recursively, 
until So is obtained. 

• Apply the Huffman algorithm to Sq. 

While the sequence of reduced sources St can be seen as 
evolving "bottom-up," the infinite code C constructed results 
from a "top-down" sequence of corresponding finite codes 
Ct, whose size grows with t, and which unfold by recursive 
reversal of the mergers in the Huffman procedure. One shows 
that the sequence of codes {Ct)t>o converges to an infinite 
code C, in the sense that for every j > 1, with codewords of 
Ct consistently sorted, the jth codeword of Ct is eventually 
constant when t grows, and equal to the jth codeword of C. 
A corresponding convergence argument on the sequence of 
average code lengths then establishes the optimality of G. 

This method was successfully applied to characterize in- 
finite optimal codes in [2J and |3|. While the technique is 
straightforward once appropriate reduced sources are defined, 
the difficulty in each case is to guess the structure of these 
source. In a sense, this is a self-bootstrapping procedure, where 
one needs to guess the structure of the codes sought, and use 
that structure to define the reduced sources, which, in turn. 
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serve to prove that the guess was correct. We will apply the 
Gallager-Van Voorhis method to prove optimality of codes for 
certain families of TDGDs in Sections [IV] and |V] In each case, 
we will emphasize the definition and structure of the reduced 
sources, and show that they are compatible with the Huffman 
procedure. We will omit the discussion on convergence, and 
the formal induction proofs, since the arguments are essentially 
the same as those in |2 | and |l3|. 

III. Parameter-singularity of optimal codes for 
TDGDs 

In the case of one-dimensional geometric distributions, the 
unit interval (0, 1) is partitioned into an infinite sequence of 
semi-open intervals {qk-i^Qk], k > 1, such that the Golomb 
code G/c is optimal for all values of the distribution parameter 
q in {qk-ijQk]- Specifically, for k > 0, Qk is the (unique) 
nonnegative root of the equation + — 1 = O. Thus, 
we have qo = 0, qi = {Vb - l)/2 ^ 0.618,^2 ^ 0.755, etc. 
A similar property holds in the case of two-sided geometric 
distributions O, where the two-dimensional parameter space 
is partitioned into a countable sequence of patches such that 
all the distributions with parameter values in a given patch 
admit the same optimal code. In this section, we prove that, 
in sharp contrast to these examples, optimal codes for TDGDs 
are parameter- singular, in the sense that a code that is optimal 
for a certain value of the parameter q cannot be optimal for 
any other value of q. More formally, we present the following 
result. 

Theorem 1: Let q and qi be real numbers in the interval 
(0, 1), with q ^ qi, and let Tq be an optimal tree for TDGD(g). 
Then, Tq is not optimal for TDGD(gi). 

Remark. It follows from Theorem [T] that any set containing 
an optimal code for each distribution TDGD(g), for all values 
of q, must be uncountable. This implies, in turn, that most 
optimal codes for TDGDs do not have finite descriptions, 
in sharp contrast with the one-dimensional case. From an 
algorithmic point of view, then, the key question is for what 
"interesting" countable sets of values of g a full character- 
ization of optimal codes is possible. In a theoretical sense, 
perhaps the ultimate such set would be that of all values 
of q which have finite descriptions (more formally, the set 
of computable values of q relative to some universal Turing 
machine; see, e.g., fT4|). For this set, the goal would be to 
obtain a general procedure which, given a finite description 
of q, and a pair j), produces the corresponding codeword 
in an optimal code for TDGD(g'). A somewhat less ambitious 
theoretical goal, although probably not less valuable from a 
practical point of view, would be to characterize optimal codes 
for a dense countable set of values of q, e.g., all rational 
values of g', or all values of q such that log q is rational. These 
comprehensive characterizations appear quite challenging, and 
remain open problems. In Sections [rv| and [V| we characterize 
optimal codes for a "smaller" infinite countable set of TDGDs, 
namely, the set of distributions TDGD(g) such that — \ogq is 
either a positive integer or the inverse of one. It will turn 



given an arbitrary value q' in the interval, encoding TDGD(g'') 
with the best available code from the characterized set results 
in relatively low added redundancy, and yields the expected 
redundancy gains over optimal symbol-by-symbol encoding 
with Golomb codes. 

We will prove Theorem [T] through a series of lemmas, which 
will shed more light on the structure of optimal trees for 
TDGDs. For simplicity, we assume throughout that a fixed 
optimal tree Tq is given (for a given value of q). 

Lemma 1: Leaves with a given signature s are found in at 
most two consecutive levels of Tq. 

Proof: Let and di denote, respectively, the minimum 
and maximum depths of a leaf with signature s'mTq. Assume, 
contrary to the claim of the lemma, that di > do -\- 1. We 
transform Tq into a tree Tq as follows. Pick a leaf with 
signature s at level do, and one at level di. Place both 
signatures s as children of the leaf at level do, which becomes 
an internal node. Pick any signature from a level strictly 
deeper than di, and move it to the vacant leaf at level di. 
Tracking changes in the code lengths corresponding to the 
affected signatures, and their effect on the cost, we have 



= Cq{Tq) + q'{do - c^i + 2) - q^'S, 



(3) 



out, as will be shown in Section VI that this set provides 
good coverage of the interval < < 1, in the sense that. 



where (5 is a positive integer. By our assumption, the quantity 
multiplying q^ in ^ is non-positive, and we have Cq{Tq) < 
Cq{Tq), contradicting the optimality of Tq. Therefore, we must 
have di < do -\- 1. ■ 

A gap in a tree T is a non-empty set of consecutive levels 
containing only internal nodes of T, and such that both the 
level immediately above the set (assuming the set does not 
include level 0) and the level immediately below it contain at 
least one leaf each. The corresponding gap size is defined as 
the number of levels in the gap. It follows immediately from 
Lemma [T] that in an optimal tree, if the largest signature above 
a gap is s, then the smallest signature below the gap is 5 + 1. 

Lemma 2: Let A: = 1 + [logg'~^J. Then, for all sufficiently 
large s, the size g of any gap between leaves of signature s 
and leaves of signature 5 + 1 in 7^ satisfies g < k — 1. 

Proof: We consider the cases q > ^, q = ^, and q < ^ 
separately. 

Case q > ^. In this case, we have k = 1, and the claim of 
the lemma means that there can be no gaps in the tree from 
a certain level on. Assume that there is a gap between level d 
with signatures s, and level d^ with signatures s-\-l, d^ — d > 2. 
By Lemma [T] all signatures 5 + 1 are either in level d' or in 
level + 1. Without loss of generality, we can assume that 
there is a subtree of Tq of height at most two, rooted at a node 
V of depth d' — 1 > d -\- 1, and containing at least two leaves 
of signature 5 + 1. Hence, the weight of the subtree satisfies 

w{v) > 2q'^^ > q' , 

and switching a leaf s on level d with node v on level d^ — 1 
decreases the cost of 7^, in contradiction with its optimality 
(when switching nodes, we carry also any subtrees rooted 
at them). Therefore, there can be no gap between the level 
containing signatures s and 5 + 1, as claimed. Notice that this 
holds for all values of 5, regardless of level. 
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Fig. 1. Tree transformations. 



Case q = ^. In this case, the TDGD is dyadic, the optimal 
profile is uniquely determined, and it and has no gaps (the 
optimal profile is that of Gi • Gi). 

Case q < ^ . Assume that s > 2^ — 2, and that there is a gap 
of size g between signatures s at level d, and signatures s + 1 
at level d-\-g-\-l. Signatures s + 1 may also be found at level 
d-\- g -\-2. Without loss of generality, and by our assumption 
on s, we can assume that there is a subtree of Tq rooted at a 
node V at level d-\- g-\-l — k, and containing at least 2^ leaves 
with signature 5 + 1, including some at level d-\- g-\-l. Thus, 
we have 

w{v)>2^q'^^ > q' = w{s), 

the second inequality following from the definition of k. 
Therefore, we must have d-\-g-\-l — k<d, or equivalently, 
g < k — 1, for otherwise exchanging v and s would decrease 
the cost, contradicting the optimality of 7^. ■ 

Next, we bound the rate of change of signature magnitudes 
as a function of depth in an optimal tree. Together with the 
bound on gap sizes in Lemma [2] this will lead to the proof of 
Theorem [T] It follows from Lemma [T] that for every signature 
s > there is a level of Tq containing at least one half of 
the s + 1 leaves with signature s. We denote the depth of this 
level by L{s) (with some fixed policy for ties), dependence 
on Tq being understood from the context. 

Lemma 3: Let 5 be a signature, and £ > 2 a. positive integer 
such that s > 2^+^ - 1, and such that L{s') = L{s) + £ for 
some signature s' > s. Then, for Tq, we have 



£-2 
log (7-1 



< s-s < 



£^l 
log (7-1 



(4) 



Proof: Since s' > s > 2^+^ - 1 > 2^- ^ - 1, by the defini- 
tion of L{s'), there are more than 2^~^ leaves with signature 
s' at level L{s'). We perform the following transformation 
(depicted in Figure [TJ A)) on the tree Tq, yielding a modified 
tree Tq'. Choose a leaf with signature s at level L{s), and graft 
to it a tree with a left subtree consisting of a leaf with signature 
s ("moved" from the root of the subtree), and a right subtree 
that is a balanced tree of height £ — 2 with 2^~^ leaves of 
signature s' . These signatures come from 2^~^ leaves at level 
L{s') of Tq, which are removed. It is easy to verify that the 
modified tree Tq defines a valid, albeit incomplete, code for 



the alphabet of a TDGD. Next, we estimate the change. A, in 
cost due to this transformation. We have 



q 



The term q^ is due to the increase, by one, in the code length 
for the signature 5, which causes an increase in cost, while 
the term —2^~'^q^ is due to the decrease in code length for 
2^-2 signatures s' , which produces a decrease in cost. Since 
Tq is optimal, we must have A > 0, namely. 



< ^ 
and thus, 2^-^ 



2^-2/ 



^0-0 ^ 2, from which the lower bound in ^ 
follows. (Note: clearly, the condition s > 2^~^ — 1 would have 
sufficed to prove the lower bound; the stricter condition of the 
lemma will be required for the upper bound, and was adopted 
here for uniformity.) 

To prove the upper bound, we apply a different modification 
to Tq. Here, we locate 2^+^ signatures at level L{s^), and 
assume, without loss of generality, that these signatures are 
the leaves of a balanced tree of height £-\-l, rooted at a node 
u of depth L{s) — 1. The availability of the required number 
of leaves at level L{s') is guaranteed by the conditions of 
the lemma. We then exchange u with a leaf of signature s at 
level L{s). The situation, after the transformation, is depicted 
in Figure [TJ^)- The resulting change in cost is computed as 
follows. 

A = C.iT^) - C,i%) = -q' + 2^+1/ . 

As before, we must have A > 0, from which the upper bound 
follows. ■ 
We are now ready to prove Theorem [T] 

Proof of The orem\Ij We assume, without loss of general- 
ity, that qi > q, and we write qi = q{l-\-e), < £ < q~^ — 1. 
In Tq, choose a sufficiently large signature s (the meaning 
of "sufficiently large" will be specified in the sequel), and a 
node of signature s at level L{s). Let > 5 be a signature 
such that £ = L{s') — L{s) > 2. We apply the transformation 
of Figure [T| A) to Tq, yielding a modified tree Tq. We claim 
that when weights are taken with respect to TDGD(g'i), and 
with an appropriate choice of the parameter £, Tq will have 
strictly lower cost than Tq. Therefore, Tq is not optimal for 
TDGD(g'i). To prove the claim, we compare the costs of Tq 
and Tq with respect to TDGD(gi). Reasoning as in the proof 
of the lower bound in Lemma [3] we write 

A = A. ra = 9i- 2^- V 

e+i \ 

(5) 



qt (1 - 2'-^ql'-^ ]<ql(l-2 



i^— 2^ log q 



where the last inequality follows from the upper bound in 
Lemma [3] It follows from ^ that we can make A negative if 

£^l 

£-2^- -loggi >0. 

log (7 ^ 

Writing qi in terms of q and e, and after some algebraic 
manipulations, the above condition is equivalent to 

log (7-^ 



£>3 



log(l+£) 



1. 



(6) 
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Hence, choosing a large enough value of ^, we get A < 0, 
and we conclude that the tree Tq is not optimal for TDGD(g'i), 
subject to an appropriate choice of s, which we discuss next. 

The argument above relies strongly on Lemma [3] We recall 
that in order for this lemma to hold, £ and the signature s must 
satisfy the condition s > 2^+^ — 1. Now, it could happen that, 
after choosing £ according to ^ and then s according to the 
condition of Lemma [sj the level L{s) -\- i does not contain 
2^~^ signatures as required (e.g., when the level is part of a 
gap). This would force us to increase which could then make 
s violate the condition of the lemma. We would then need to 
increase s, and re-check £, in a potentially vicious circle. The 
bound on gap sizes of Lemma [2] allows us to avoid this trap. 
The bound in the lemma depends only on q and thus, for a 
given TDGD, it is a constant, say gq. Thus, first, we choose a 
value £o satisfying the constraint on £ in ([6]). Then, we choose 
s > 2^0+^'^+'^. Now, we try £ = 4, 4 + 1, 4 + 2, . . . , in 
succession, and check whether level L{s)-\-£ contains enough 
of the required signatures. By Lemmas [T] and [2j an appropriate 
level L{s') will be found for some i < io-\- gq-\-2. For such a 
value of ^, we have 2^+^-1 < 2^0+^^+"^ -1 < s, satisfying the 
condition of Lemma [3] This condition, in turn, guarantees also 
that there are at least 2^~^ signatures s' at L{s'), as required. 



IV. Optimal codes for TDGDs with q = 2 

It follows from the results of Section |llll that it is infeasible 
to provide a compact description of optimal codes for TDGDs 
covering all values of the parameter as can be done with 
one-dimensional geometric distributions Q, O or their two- 
sided variants |3|. Instead, we describe optimal prefix codes 
for a discrete sequence of values of q, which provide good 
coverage of the parameter range. In this section, we study 
optimal codes for TDGDs with parameters q — for 
integers A: > 1, i.e., g > ^, while in Section M we consider 
parameters of the form q = 2~^, k > 1, covering the range 



(A) 



(B) 



q < ^ (the two parameter sequences coincide at k = 1, q 
which we choose to assign to the case covered in this section). 

A. Initial characterization of optimal codes for q = 

The following theorem characterizes optimal codes for 
TDGDs of parameter q = 2~^/^, k > 1, in terms of unary 
codes and Huffman codes for certain finite distributions. In 
Subsection |IV-C| we further refine the characterization by 
providing explicit descriptions of these Huffman codes. 

Theorem 2: An optimal prefix code Ck for TDGD ((7), with 
q = 2 -1/^, /c > 1, is given by 

Ck{iJ) = Tk{i mod kj mod k) • Gi( J ) • Gi( J ), 

where Gi is the unary code, and T^, referred to as the top 
code, is an optimal code for the finite source defined by the 
following symbol set and respective weights: 



A = {{ij) I < ij < k}, w{ij) 



(7) 



Remarks. 

1) Theorem |2] can readily be generalized to blocks of d > 2 
symbols. For simplicity, we present the proof for d = 2. 



'g 



Ti T2 



Fig. 2. Graphical representations for trees with associated weights. 



2) Notice that Ck {i^j) concatenates the "unary" parts of the 
codewords for i and j in a Golomb code of order k (as if 
encoding i and j separately), but encodes the "binary" 
part jointly by means of T^, which, in general, does 
not yield the concatenation of the respective "binary" 
parts Qk{i) and Qk{j)- However, when k = 1 and 
k = 2, Ck is equivalent to the full concatenation Gk-Gk- 
When A: = 1, the code Tk is void, and Ci = Gi Gi. 
The parameter in this case is q = ^, the geometric 
distribution is dyadic, and the code redundancy is zero. 
When k = 2, we have q = 1/a/2 and the finite 
source Ak has four symbols with respective weights 
{1, V2/2, V2/2, 1/2}. This source is quasi-uniform, 
and, therefore, it admits Q4 as an optimal tree. This is 
a balanced tree of depth two, which can also be written 
as Qa = Q2 • Q2- Thus, we have C2 = G2 • G2. Later 
on in the section, in Corollary [T] we will show that 
this situation will not repeat for larger values of k: the 
"symbol by symbol" code Gk ■ Gk is strictly suboptimal 
for TDGD(2-i/^) when k > 2. 

In deriving the proof of Theorem |2] and in subsequent sec- 
tions, we shall make use of the following notations to describe 
and operate on some infinite trees with weights associated to 
their leaves. We denote by \v] the trivial tree consisting of a 
single node (leaf) of weight v. Given a tree T and a scalar 
g, gT denotes the tree T with all its weights multiplied by 
g. Given trees Ti and T2, the graphic notation in Figure [2j A) 
represents a tree T consisting of a root node with Ti as its 
left subtree and T2 as its right subtree, each contributing its 
respective leaf weights. The multiset of weights associated 
with T is the union of the multisets associated with Ti and T2. 
We will also use the notation [ Ti T2 ] to represent the forest 
consisting of the separate trees Ti and T2, which has the same 
associated multiset of weights as the tree T of Figure [2jA), 
but a different underlying graph. We denote by Tg the tree of 
a unary code whose leaf at each depth i > 1 has weight g\ 
and by the structure in Figure ^B). It is readily verified 
that Tg corresponds to the concatenation of two unary codes, 
with each of the z — 1 leaves at depth i > 2 of Tg carrying 
weight g\ In particular, as shown in Figure [sj the tree q~'^Tq 
corresponds to the optimal tree for the dyadic TDGD with 
q = ^, where each leaf is weighted according to the signature 
of the symbol it encodes. 

The following lemma follows directly from the above 
definitions, applying elementary symbolic manipulations on 
geometric sums. 

Lemma 4: For any real number ^, < ^ < 1, we have 

-l/k 



In particular, if q = 2 
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Fig. 3. 



The tree q~'^T^. 



we have ^(7^1) ^^^(7^l) 1. 

We rely on this observation in the proof of Theorem |2] 
below. In the proof, when defining virtual symbols, we further 
overload notation and regard trees with associated weights, 
such as q^T^k, also as multisets of signatures, with a signature 
s for each leaf of the tree with weight . 

Proof of Theorem |2]- We use the Gallager-Van Voorhis 
construction \2\. For 5 > 0, define the reduced source 



where 



l-Ls = {i^A\i<s} 



(signatures in 1-Ls occur with the same multiplicity as in A), 
and 

k-l 



i=0 



k times s+fe+i+1 
times 



times 



The multisets (of signatures) q^^^T^k and q^^^T^k play the 
role of virtual symbols in the reduced sources, as discussed in 



Subsection |II-C| (we omit the qualifier 'virtual' in the sequel). 
It is readily verified that all the weights of symbols in J^s are 
smaller than the weights of signatures in l-Lg- Since q = 2~^^^, 
by Lemma |4j we have w{q^^^T^k) = ujiq^^^T^k) = w{s-^i). 
Thus, we can apply steps of the Huffman procedure to in 
such way that the s + i + 1 signatures 5 + i are merged with 
symbols q^^'^T^k, resulting in trees q^^^~^T^k- 

The remaining k symbols q^^^T^k can be merged with the k 
symbols q^^'^T^k, resulting in k trees q^^^~^T^k when i ranges 
from k—1 down to 0. After this sequence of Huffman mergers, 
Ws is transformed into Ws-k^ as long as 5 > /c. Starting from 
s = tk fox some t > 0, the procedure eventually leads to Wq. 
Formally, our reduced source Wtk-, t > 0, corresponds to St 
in our description of the Gallager-Van Voorhis construction in 
Section [TTCl Thus, the iteration leads to Sq, as called for in 
the construction. It is readily verified that this source admits an 
additional sequence of Huffman mergers, as described above, 
leading (with a slight abuse of notation) to 



2 = 



Continuing with the Huffman procedure, each symbol q^~^T^k 
in 5-1 can be merged with a symbol q^~^T^k, further leading, 
by the definition of (see Figure |2];B)), to a reduced source 

S* = { q-^'^Tf,, q-^^+^T^u, q-^'+^T^,... 



2 

times 



3 

times 



' ' ' ^ q ^ ^Tqk , q ^Tqk , . . . , ^ ^Tqk , q '^T^k > . 



k 

times 



k-l 
times 



2 

times 



1 

time 



We now take a common "factor" q~'^^T^k from each symbol of 
*S*. By the discussion of Figures |2] and [3] this factor corre- 
sponds to a copy of Gi • Gi, with weights that get multiplied 
by q^ every time the depth increases by 1. After the common 
factor is taken out, the source 5* becomes the source Ak 
of (|7]), to which the Huffman procedure needs to be applied 
to complete the code construction. Thus, the code described 
in the theorem is optimal. ■ 

To make the result of Theorem [2] completely explicit, it 
remains to characterize an optimal prefix code for the finite 
source of ([t]). The following lemma presents some basic 
properties of Ak and its optimal trees. Recall the definitions 
of a-uniformity and fringe thickness from Section [ll] 

Lemma 5: The source Ak is 4-uniform, and it has an 
optimal tree T of fringe thickness /t < 2. 

Proof: It follows from ^ and the relation q^ = ^ that the 
maximal ratio between weights of symbols in Ak is = 
4^^^ < 4. Hence, Ak is 4-uniform. The claim on the optimal 
tree holds trivially for /c < 2, in which case the optimal tree 
for Ak is uniform. To prove the claim for A: > 2, consider 
the multiset A^ C Ak consisting of the lightest 2[^i^^] 
signatures in Ak, i.e., 

Al=JC[j{k,k,...,k, . . . , k^l , . . . 

k — l times k — 2 times 

. 2/c-3 , 2/c-3 , 2/C-2}, 

2 times 1 time 

where JC 



2 times 

{k-l} if /cmod4 G {2,3}, or JC is empty 
otherwise. The sum of the two smallest weights of signatures 
in Al satisfies 

2/C-2 I ^2k-3 ^2k-2( 



w{2k-2) ^w{2k-3) 



q- ^=q- '{l^q-') 



^2k-3 _ 
-1)^.-2 



> w{k - 2) . 



k i+1 
times times 



The sum of the two largest weights in Al, on the other hand, 
is either q^ if k mod 4 G {0, 1}, or ^(1 + q~^) otherwise. 
Therefore, if the Huffman procedure is applied to Ak, every 
pair of consecutive elements of Al will be merged, without 
involving a previously merged pair. The ratio of the largest 
to the smallest weight remaining after these mergers is at 
most ^{l-\-q~^)/q^~^ = q-\-l < 2. Hence, the resulting 
source is quasi-uniform and has a quasi-uniform optimal tree. 
Therefore, completing the Huffman procedure for Ak results 
in an optimal tree of fringe thickness at most two. ■ 
To complete the explicit description of an optimal tree for 
Ak, we will rely on a characterization of trees T with /t < 2 
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that are optimal for 4-uniform sources]^ This characterization 
is presented next. 

B. Optimal trees with /t ^ 2 for A-uniform sources 

To proceed as directly as possible to the construction of an 
optimal tree for Ak, we defer all the proofs of results in this 
subsection to Appendix |A| We start by characterizing all the 
possible profiles for a tree T with N leaves, and fr Let 
T be such a tree, let m = [log A/"], and denote by ni the 
number of leaves at depth i in T. 

Lemma 6: The profile of T satisfies ni = for £ < m—2 
and i > m+1, and either nm-2 = or nm+i = (or both, 
when /t < 1). 

It follows from Lemma |6] that T is fully characterized by the 
quadruple (n^_2, n^_i, n^, n^+i), with either nm-2 = or 
nm-\-i = 0. We say T is long if nm-2 = 0, and that T is short 
if nm-\-i = 0. Defining M = m — a, where a = 1 if T is short, 
or if it is long, a tree with fr ^ can be characterized 
more compactly by a triple of nonnegative integers Nt = 
(nM-i, ^M, ^M+i). We will also refer to this triple as the 
(compact) profile of T, with the associated parameters A^, m, 
and cr understood from the context. Notice that when nm-2 = 
^m+i = 0, T is the quasi-uniform tree Qat, and (abusing the 
metaphor), it is considered both long and short (i.e., it has 
representations with both a = and a = 1). 

Lemma 7: Let T be a tree with fr ^ "2- For a G {0, 1} 
and M = m — a, define 

2M 



{N - 2^)a and 



2A'- 



Then, T is equivalent to one of the trees T^j 
profiles 



defined by the 



N7 



(nM-l, ^M, ^M+l) 



= 2 



2A'-2^-3c, 2c 



a G {0, 1}, <c<Ca 



(8) 



Remarks. 

1) Equation ^ characterizes all trees with A^ leaves and 
/t < 2 in terms of the parameters a and c. The 
parameter c has different ranges depending on a\ we 
have N - 2^-^ < c < [ ^^-^^'^ \ when a = 1, 
and < c < [ ^^"^"" j when cr = 0. The use of the 
parametrized quantities M, c^, and c^r will allow us to 
treat the two ranges in a unified way in most cases. 
Also, notice that Ti and Tq^cq represent the same 
tree, corresponding, respectively, to interpretations of the 
quasi-uniform tree Qat as short or long. 

2) The parameter c represents the number of internal (non- 
leaf) nodes at level M of T. An increase of c by one 
corresponds to moving a pair of sibling leaves previously 
rooted at level M— 1 to a new parent at level M (thereby 
increasing the number of internal nodes at that level by 

^Notice that not every 4-uniform source admits an optimal tree with /t < 2 
(although the ones of interest in this section do). For example, an optimal 
tree for the 4-uniform source with probabilities ^(4, 3, 1, 1, 1) must have 
/t >2. 



one). The number of leaves at level M decreases by 
three, and the numbers of leaves at levels M — 1 and 
M + 1 increase by one and two, respectively. 
Consider now a distribution on A^ symbols, with associated 
vector of probabilities (or weights) p = (pi,P2, • • • ,PAr ), 
Pi ^ P2 ^ • • • ^ Pn- Let I/^ c denote the average code length 
of Tcr,c under p (with shorter codewords naturally assigned to 
larger weights), and let 



La,c-1, CrG{0, 1}, C^<C< 



(9) 



It follows from these definitions, and the structure of the 
profile ^ (see also Remark [2] above), that for a e {0, 1} 

and < c < c^, we have 



a,c = PN-2C+1 + PN-2C+2 " P2^-Ar+c • 



(10) 



A useful interpretation of (10) follows directly from the 
profile ([8]): for Tcr,c, ^cr,c is the difference between the sum 
of the two heaviest weights on level M + 1 and the lightest 
weight on level M — 1. 

Let sg{x) be defined as —1,0, or 1, respectively, for nega- 
tive, zero, or positive values of x, and consider the following 
sequence (recalling that Cq = 0): 

S = -Sg(Di,cJ, -Sg(Di,ci-l), . . . , -Sg(Di,c^ + l), 

Sg(I)0,l), Sg(I)o,2), Sg(I)o,co). (11) 

Lemma 8: The sequence s is non-decreasing. 

The definition of the sequence s induces a total ordering of 
the pairs (a, c) (and, hence, also of the trees Tcr,c), with pairs 
with (7 = 1 ordered by decreasing value of c, followed by pairs 
with cr = in increasing order of c. The two subsequences 
"meet" at c^, which defines the same tree regardless of the 
value of a (in the pairs ordering, we take (1,0^) as identical 
to (0, Cq) = (0, 0)). We denote this total order by ^. Recalling 
that the quantities D^j c are differences in average code length 
between consecutive codes in this ordering. Lemma [8] tells 
us that, as we scan the codes in order, we will generally 
see the average code length decrease monotonically, reach a 
minimum, and then (possibly after staying at the minimum 
for some number of trees) increase monotonically. In the 
following theorem, we formalize this observation, and identify 
the trees Tcr,c that are optimal for p. 

Theorem 3: Let p be a 4-uniform distribution such that p 
has an optimal tree T with /t < 2. Define pairs (cr*,c*) and 
(cr*,c*) as follows: 

(cr*,c*) = (l,Ci) if Di^ci ^ 0^ 
(a*,c*) = (0,co) ifl)o,co<0; 

otherwise, if l^i,ci < 0' i^-^^-) t)e such that 
( — l)(^-)sg{Dcr_^c-) is the last negative entry in s, and define 

(cr*, C*) = (cr_, C_ - cr_) ; 

if Do^co > 0' let (a-+,c+) be such that (-l)(^+)sg(i:>^^,c+) is 
the first positive entry in s, and define 

(cr*, c*) = (a+, c+ - 1 + a+) . 

Then, all trees Tcr,c with (cr*,c*) ^ (c^? c) ^ (cr*,c*) are 
optimal for p. 
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TABLE I 

Finding optimal trees T^,c for N = 19,p = ^(4,4,3,3,3,3,3,3,3,3,3,2,2,2,2,2,2,1,1) (optimal tree parameters emphasized in boldface). 



(a,c) 


(1,7) 


(1,6) 


(1,5) 


(1,4) 




(0,1) 


(0,2) 


(riM-l, TiM^TiM+l) 


(4,1,14) 


(3,4,12) 


(2,7,10) 


(1,10,8) 


(13,6,0) 


(14,3,2) 


(15,0,4) 


49 • L^,c 


214 


211 


208 


206 


206 


206 


208 


49 • L>.,c 


3 


3 


2 










2 


s 


-1 


-1 


(cr_,C_) 




(cr*, c*) 






(cr*,C*) 


(^^+,c+) 



Notice that, by Lemma [s] the range (cr*,c*) ^ (c^, c) ^ 
(cr*, c*) is well defined and never empty, consistently with the 
assumptions of the theorem and with Lemma [7] The example 
in Table |I| fists afi the trees T^,c with /t < 2 for = 19, as 
characterized in Lemma |7] and shows how Theorem [3] is used 
to find optimal trees for a given 4-uniform distribution on 19 
symbols. 

C. The top code 

By Lemma [sj Theorem [3] applies to the source Ak defined 
in ([7]). We will apply the theorem to identify parameters 
(cr/e,c/c) that yield an optimal tree Tfj^^ck ^k- 

For the remainder of the section, we take N = k'^, and 
let p = (pi , p2 , • • • , P/c2 ) denote the vector of (unnormalized) 
symbol weights in Ak, in non-increasing order. Thus, we have 

p = ((7^g\g\...,g^g^...,g^...,^2fe-3^^2fe-3^^2fe-2)^ 

Here, is repeated j + 1 times for < j < k—1, and 
2k — 1 — j times for k < j < 2k— 2. The following lemma, 
which follows immediately from this structure, establishes the 
relation between indices and weights in p. 

Lemma 9: For < i < k{k -\- l)/2, we have p^+i = , 
where j is the unique integer in the range < j < k — 1 
satisfying 

. _ j{j + 1) 



for some 



< r < j . 



(12) 



For < i' < k{k 



l)/2, we have Pk^- 



^2k-2- 



\q^ ^ , where / is the unique integer in the range < 
j' < k — 1 satisfying 



fif + 1) 



for some r\ < r' < j' . (13) 



Ak are long (a = 0); otherwise, they are short (a = 1). 
Proof: Assume = m. Then, we can write 

2m < 2^+^°^^ 2Q 



so 2"^ -k^ <k^ - k{k - l)/2. If + 1 > ci, then aU trees 
Tcr,c in ([8j) are long. Otherwise, I^i c^+i is well defined, and 
we have 



-^l,c-L + l = — ^l,/e2_2'^-i + l 

= Pi - (P2--/e2-l +P2--2O 

< Pi- 2pk2_kik-l)/2 =Pi- 2(7^"^ = 1 



(15) 



where the first and second equalities follow from the definition 
of Ci and from ([TO]), the first inequality from the ordering of 
the weights and from ([14]), the third equality from Lemma l9] 
and the last equality from the relation q^=\- By Lemma pj 
we conclude that optimal trees for Ak are long in this case. 
Similarly, when M' = m — 1, we have 



2^ > 2g > 2k^ -k{k-l)/2-2, 
^1 > k^ -k{k- l)/2 - 1, and 



(16) 



so 2^ - / 

Pfe2-/c(fe-l)/2-l 

in dSl) are short. Otherwise, similarly to (p3]), we have 



-/C2 + 1 



< 



q^ = \. If Co = Cq = 0, then aU trees 



^0,1 =Pfe2-i+Pfe2-p2--fe2+i > 2q 



,2k-2 



1 



1 



We define some auxiliary quantities that will be useful in the 
sequel. Let m [log P], Q = k'^ - \k{k - 1) /4] , and M' = 
\\0g2 Q] , with dependence on k understood from the context. 
We assume that k > 2, since the optimal codes for /c = 1 
and k = 2 have already been described in Subsection IV-A 
It is readily verified that we must have either = m or 
M' = m — 1. The next lemma shows that the relation between 
M' and m determines the parameter a of the optimal trees T^r^c 
for Ak. 

Lemma 10: If M' = m, then trees Tcr,c that are optimal for a 



which implies that optimal trees are short in this case. ■ 
It follows from Lemma [To| that we can take m — M' as the 
parameter a for all trees Tcr,c that are optimal for p. Notice that 
is analogous to the parameter M defined in Lemma |7j but 
slightly stricter, in that, in cases where a quasi-uniform tree is 
optimal, m — M' will assume a definite value in {0, 1} (which 
will vary with k), while, in principle, a representation with 
either value of a is available. This very slight loss of generality 
is of no consequence to our derivations, and, in the sequel, we 
will identify M with M^ i.e., we will take M = [logQ]. It 
also follows from Lemma 10 that when applying Theorem [3] 
to find optimal trees for p, we only need to focus on one of the 
two segments (corresponding to a=0 or cr=l) that comprise 
the sequence s in ( pT^ , the choice being determined by the 
value of k. This will simplify the application of the theorem. 

Lemmas [9] and [T0| together with Theorem [3] suggest a 
clear way, at least in principle, for finding an optimal tree 
Trr.r. foY Ak- Tfic parameter a is determined immediately as 
m — M (recalling that m and M are determined by k). 



2k' 



Now, recalling the expression for Dcr,c in (10), we observe 
that as c increases, the weights Pk2-2c-\-i and Pk'^-2c-\-2 also 
increase, while p2^-/c2+c' which gets subtracted, decreases. 
Thus, since, by Theorem [3] an optimal value of c occurs when 



2\k{k l)/4] < 2k k{k l)/2 , (14) jj^ ^ changes sign, we need to search for the value of c for 
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TABLE II 

Optimal code parameters and profiles for Ak, 3 < < 10. 



k 


M 


i 


r 




Ck 


{uM-i^nM^riM+i) 


2 


2 














(0,4,0) 


3 


3 








1 


1 


(0,7,2) 


4 


4 


1 








1 


(1,13,2) 


5 


5 


3 


1 








(7,18,0) 


6 


5 


1 





1 


5 


(1,25,10) 


7 


6 


5 











(15,34,0) 


8 


6 


2 


2 





5 


(5,49,10) 


9 


6 








1 


17 


(0,47,34) 


10 


7 


7 


1 





1 


(29,69,2) 




2^-1 2^-1 



Fig. 4. Trees Vk and . 



which the increasing sum of the first two terms "crosses" the 
value of the decreasing third term. This can be done, at least 
roughly, by using explicit weight values from Lemma [9] with 
i' e {2c - 1, 2c - 2} and z = 2^ - /c^ + c, and solving a 
quadratic equation, say, for the parameter j (the parameter 
j' will be tied to j by the constraint D^j^c ~ 0). A finer 
adjustment of the solution is achieved with the parameters r 
and r\ observing that a change of sign of Dcr^c can only occur 
near locations where the weights in p change (i.e., "jumps" 
in either j or j^), which occur at intervals of length up to 
k. At the "jump" locations, either r or must be close to 
zero. While there is no conceptual difficulty in these steps, 
the actual computations are somewhat involved, due to various 
integer constraints and border cases. Theorem [4] below takes 
these complexities into account and characterizes, explicitly 
in terms of k, the parameter pair (a/c, c^) of an optimal code 
Ta,,cfc for Ak. 

Theorem 4: Let q = 2"^/^, Q = P - \k{k - l)/4], m = 
[log/c^], and M = [logQ]. Define the function 

{k-x-2){k-x-l) 



A(x) =2r-2^^^+^+x(x + l) 

Let xq denote the largest real root of A(x), and let ^ 
Set 



-A(j) + 1 



), ifA(0<2C, 



Otherwise. 



(17) 
[xo\. 



(18) 



Then, the tree T^y^^ck, as defined by the profile ([5]) with a 

cFk = m — M and 



c = Ck = k^-2^^ 



(19) 



is optimal for Ak- Furthermore, Ck is the smallest value of c 
for any optimal tree T^r^^c for Ak- 

The proof of Theorem [4] is presented in Appendix |B] In 
the theorem (and its proof), we have chosen to identify the 
optimal tree T^j^^c with the smallest possible value of c. It 
can readily be verified that this choice minimizes the variance 
of the code length among all optimal trees T^^^c With only 
minor changes in the construction and proof, one could also 
identify the largest value of c for an optimal tree, and, thus, 
the full range of values of c yielding optimal trees T^j^^c- For 
conciseness, we have omitted this extension of the proof. 

Examples of the application of Theorem [4] are presented in 
Table |ll| which lists the parameters M, j, r, a/e, c^, and the 



profile of the optimal tree T^r^ , ck defined by the theorem, for 
3 < /c < 10. 

The tools derived in the proof of Theorem |4] also yield 
the following result, a proof of which is also presented in 
Appendix |B] 

Corollary 1: Let /c > 2 and q = 2"^/^. Then, Gk • Gk is 
not optimal for TDGD(^). 

D. Average code length 

The following corollary gives explicit formulas for the av- 
erage code length of the codes Ck characterized in Theorem [2] 
and Theorem [4] The proof is deferred to Appendix [C] 

Corollary 2: Let M, A(x), j, and r be as defined in 
Theorem H Then, the average code length Cq{Ck) for the 
code Ck under TDGD(g), for arbitrary q, is given by 



C,{Ck) = M^l 



(20) 



where 



V{q) = 1 - g'^+i + {l-q) (9'+' {k-j-l)+ j) 
+ (l-q)2(g'=(2r + A(i))-r). 

When q = 2 we have 

Cg{Ck)=M + l + 2qW*{q), (21) 

with 

(g) = 1 + {1-q) {q k + {2-q)j) + (1 - qf (1 + A (j)) . 

V. Optimal codes for TDGDs with q = 2~^ 
A. The codes 

Assume q = 2~^ for some integer k > 1. We reuse the 
notation Um = for a uniform tree of depth m, assuming, 
additionally, that its 2^ leaves have weight one. The infinite 
tree (and associated multiset of leaf weights) Vk is recursively 
defined as follows. Start from Uk, and attach to its leftmost 
leaf a copy of qVk- Thus, Vk has 2^ — 1 leaves of weight q^ at 
depth (5 + l)/c for all s>0, and no other leaves. The related 
tree V^ is defined by starting from Uk-i, and attaching to its 
leftmost leaf a copy of qVk- Thus, V^ has 2^~^— 1 leaves of 
weight q^ at depth k — 1, and 2^ — 1 leaves of weight q^ at 
depth (5 + l)/c — 1 for all s > 0. The trees Vk and are 
illustrated in Figure |4] 
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We describe a sequence of binary trees (and codes) C-k, 
which, later in the section, will be shown to be optimal for 
TDGDs with q = 2~^, k > 1. We describe the trees by layers. 
A layer is a collection of consecutive levels of the tree, 
containing all the leaves with signature s. The structure of the 
layers, and how unfolds into for all 5, are presented 
next, providing a full description of the trees C-k- 

Assume /c > 1 is fixed. We distinguish two main cases 
for the structure of L5, which depend on the value of s, as 
specified below. In the description of the layers, each tree 
structure is a virtual symbol. We will refer to both original 
and virtual symbols simply as symbols. 

Case 1) < 5 < 2^-1 - 2: 
Write 5 = 2^ + j - 1 with < i < k - 2, < j < 2' - 1. 
Layer L5 consists of nodes in two levels, arranged as follows: 



Case 2 




Fig. 5. Layer transitions in C-k for k > 2. The expressions above the 
self-loops indicate the number of iterations on the given layer type before the 
transition to the next type. 



(iii) 2^-1 - 1 < j < 2^-4: 



A A.. .A 

m mm mm i 



A A... A 



(22) 



< times 3.2''-i-2-7 ti 



j times 



(27) 



2*— j — 1 times 



J times 

(recall that the factor multiplies all the weights of objects 
inside the brackets, so that the leaves denoted ^ in (22 ) indeed 
correspond to signatures s). 

The symbol IZg represents a tree containing all the signa- 
tures strictly greater than 5, scaled by q~^ . Layer L5 emerges 
from constructing a quasi-uniform tree for 5 + 2 symbols (s + 1 
signatures 5, and the symbol IZg), attached to IZg-i of the 
previous layer if 5 > 0, or to the root of the tree if 5 = 0. We 
have s + 2 = 2* + 1 + j, < j < 2* — 1, so the quasi-uniform 
tree has 2* — j — 1 leaves at depth i, and 2j + 2 leaves at level 
i + 1, as shown in ([22]). 

Case 2) s> 2^'^ 
Write 



(iv) j = 2^-3: 

A A^..A 
qv^ R. a H mm 



? times 2'^-^ + ltin 



2^-1 _2 tijnes 



(28) 



(V) j = 2'=-2: 

M...M qVk m---m A A.. .A 

E m E mm i 



2"-'-! times 



1: 



2"~^-l times 



(29) 



l + (2^-l)^ + j, £>0, 0<j<2^-l. (23) The last layer from Case 1 contains all the signatures 

2. All signatures s > are contained in 1Zs'. 



There are five types of layers in this case, as described below. 
The symbol 1Zs in each case represents a tree containing all 
the signatures strictly greater than s that are not contained in 
other virtual symbols in L5, suitably scaled by q~^. Also, it 
will be convenient to use the notation Al as shorthand for the 
sequence 

M: qVk, (24) 

2 ''-I times 

(Al still counts as 2^ symbols in L5). 
(i) < j < 2^-1-3 (for k > 2): 

A A^^.A 



k-1 



1. 



In particular, there are 2^~^ signatures 5^ + 1 = 2 
Assume k > 2. A quasi-uniform tree with 2^~^ + 1 leaves 
is constructed, rooted at IZs'- This tree has 2^~^ — 1 leaves 
labeled + 1 at depth k — 1 from its root, and two leaves at 
depth k, one of which is labeled 5^ + 1, and one that serves 
as the root for 7^s/+i. This is consistent with the structure of 
the first layer in Case 2 shown in (25 ), with s = £ = 



q ' 



M..M E-.-E 



7^. E mm mm 



-j — l times 



J times 



(ii) j 



ifc-i 



2 : 



M...M /\ A ... A 



(25) 



(26) 



I times 



-1 times 



and j = 0. From that layer on, layers of types (i)-(v) above 
unfold following the cyclic pattern shown in Figure |5] Layers 
of types (i) and (iii) are repeated 2^~^— 2 times each in the 
cycle, which is closed by a transition from a layer of type (v) 
back to one of type (i), corresponding to an increment of the 
value of £ by one. 

When k = 2, layers of type (i) or (iii) are not used. In 
this case, the only layer in Case 1 contains the signature 0. 
A uniform tree U2 is constructed, rooted at TIq. One pair of 
sibling leaves is assigned to signature 1, while the other pair 
is assigned to IZi and Ui, attaining a configuration of type 
(ii) in Case 2. From that point on, the cyclic layer sequence 
is (ii)^(iv)^(v)^(ii). 

The fine details of the various layer transitions, justifying the 
structure in Figure [5j are given in Appendix |D] The structure 
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Casel 








Level 


1 


z=0, j=0 






AO 


z=l, j=0 








2 
3 








2X2 


4 
5 


Case 2 




3 3A 




6 
7 

8 


(i)€=0, j=0 






3 


(i) ^=0, j=l 






t 4X^ 


9 

10 
11 










12 


(ii) ^=0, j=2 




5X5 5, 


X5 5X5 


13 
14 




6A6 e/G 6^ 


16 



(iii) ^=0, j=3 





7A7 tX^ 


(iii) ^=0, j=4 




(iv) ^=0, j=5 


8^ sXs 8/ 







(v) ^=0, j=6 J^Si 



Case 2 

(i) ^=1, j=0 



17 
"I8 
19 
20 
■21 
22 
23 
■■24 
25 
26 
■■■27 
28 
29 



(i)^=l,j=l 30 



Fig. 6. Top levels comprising layers Ls for s < 11 in the optimal tree 
C_3 {q = |). Leaf signatures are noted; dotted lines separate layers L^, and 
circled nodes represent roots of trees IZg. Grayed ovals represent sequences 



is also illustrated by the example in Figure [6] which shows 
the layers for 5 < 11 in C_3. 

Due to the cyclic nature of the construction, the subtree 
T^.s, s > 2^~^ — 2 is, in general, identical to all subtrees 

In 



n 



> 0, up to appropriate scaling by q 



the example of Figure [6j the tree 1Zg is identical to the tree IZ2, 
indicated in the figure as 7^2+7^' • An additional source of self- 
similarity is provided by the trees Vk and ; in Figure [6j the 
sub-tree labeled q^^V^ is identical to that labeled q^V^ , etc. 
Overall, although the width of the tree is unbounded (driven by 
the £ copies of Ai in each layer of Case 2), the total number 
of distinct sub-trees in C-k is finite. 

The following theorem enumerates the code lengths as- 
signed to signatures by the codes C-k- It follows immediately 
from the description of the codes in (22) and ([25])-([29|). 

Theorem 5: Code /c>l, assigns code lengths A5 or 

A5 + 1 to signatures s according to the expressions for A5 



and the codeword counts in Tables [Ill| and |IVj corresponding, 
respectively, to the cases < s < 2^~^— 2 (Case 1) and 
s > 2^-1 - 1 (Case 2). 

We now present some auxiliary results that will be useful 



TABLE III 

Code lengths and codeword counts for codes C_fc on 

SIGNATURES 5, < S < 2^~^ - 2. 



Case 1: < s < 2^^" 


-1-2, s = 2^ + 7-l, 0<i<k-2 


A, = (5 + 2)(i + l)-2^+i 






Number of codewords (signatures) 


Range of j 


length As 


length As+1 


< j < 2^ - 1 


(2^-J-l) 


2j + l 




TABLE IV 




Code lengths and codeword counts for codes on 


SIGNATURES S > 2^~^ — 1. 




Case 2: s>2^-^- 


1, s = 2^-^-l+(2^-l)i+j, ^>0 


As = (5 + 2)/c-2'^ 






Number of codewords (signatures) 


Range of j 


length As 


length As+1 


< j < 2^-^-3 


(2'^-l)^ + (2^-i-j-l) 


2^+1 


j = 2^-1-2 


(2'^-l)^ 


2'^-2 


2^-1-1 <j< 2^-4 


(2'^-l)^ + 3-2'^-i-2-j 


2j+2-2'^ 


j = 2^-3 


(2'^-!)^ + 2^^-1+1 


2^-4 


j = 2^-2 


(2'^-l)^+2'^-i-l 


2'^-l 



in proving the optimality of the codes C-k- We rely on the 
following relations, which are readily derived from the defini- 
tions of the respective trees, under the assumption q = : 



(30) 



The next lemma bounds the weight of the symbol IZs in ( |22| ) 
and ([25j-([29l). 

Lemma 11: When s < 2^~^ - 2 (Case 1), we have < 



wills) < When s > 2 
wills) < 1. 

Proof: For s<2^~^— 2, we have 



2 (Case 2), we have ^ < 



wills) 



s'=s-\-l 

CO 

E 

r=0 



is' + l)q-'wis') 



(5 + r + 2)(7 



r+l 



(5 + l)(l-^) + l 

il-q)^ 



q- (31) 



The right-hand side of ( [3T] ) increases with s. Setting s = 

2^-1 - 2 = ^ - 2, we obtain wills) = ^ ^ 



2q 



2 {l-qyj^ 



which satisfies the claimed upper bound for g < |. When 
s > 2^~^ — 1, Us contains all the signatures s' > s (with 
their weights scaled by q~^) that are not contained in the 
components qVk of the groups Al, or in a possible sibling 
qUk-i or qV^ of Us. Write s as in (23). The scaled total 
weight of signatures 5' > 5 is 



Ws 



00 



29(i + i) + i 



is + 2)q 



q 



r=0 



(1-9)^ 



2(1-9) 



q 

(1-9)^ 



where the last equality follows by applying ( [23] ) and substi- 
tuting q~^ for 2^. Let denote the part of Ws that is con- 
tained in the symbols qVk, qUk-i, or qVj^ mentioned above. 



Observing the layer structures in (25 )-(29), and applying (30), 
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we obtain W' = £ - 



S = 



where: 

< j < 2^-1 - 3, 
2^-1 - 2 < j < 2^ - 3, 
j = 2^ - 2 . 



(32) 



The claim of the lemma for s > 2 — 2 follows by 
writing w{lZs) = Ws — W^, observing that w{lZs) increases 
monotonically with j, and bounding w{1Zs), as an elementary 
function of , in the interval < q < 



\ for each of the cases 



in (32). Notice that due to the mentioned monotonicity, w{7ls) 
is evaluated only at the ends of the ranges of j in ([32]), and 
we substitute for 2^. ■ 



The following is an immediate consequence of Lemma 1 1 



Corollary 3: Let TZ^ denote the virtual symbol containing 
1Zs in each layer Lg listed in (22) and (25)-(29). Then, after 
scaling by q~^, all the symbols to the left of 71^ in are of 
weight 1, all the symbols to its right are of weight 2, and we 
have 1 < ^^;(7^'J < 2. 

Proof: The claims on the symbols to the left and to the 
right of 1Z'g follow from ( 30 ) and the definition of the notation 
M in ([24l). As for 7^1, we have ^(7^'J = 1 + w{ns), and 



the claim of the corollary follows by applying Lemma [TT] ■ 
Theorem 6: The prefix code C-k is optimal for TDGD(g') 
with q = 2"^, k > 1. 

Proof: As before, we rely on the method from f?]. The 
reduced sources are defined by Ss = Hs ^ J^s^ where Hs 
denotes, as before, the multiset of signatures strictly smaller 
than s, and the multiset is essentially identical to the layer 
Lg defined in ([22]) and ([25])-([29|). The steps taking a reduced 



source to one of lower order follow the layer "unfolding" 
steps listed in the description of the codes C-k (see the 
discussion following ([22]) and ([25))-([29)), and Appendix [P]), in 



reverse order (bottom-up). It remains to show that these steps 
correspond to a valid sequence of mergers in the Huffman 
procedure. Consider a layer L^, and let ?/^2, • • • , V^at denote 



its symbols, listed from left to right, as shown in ( |22| ) and (|25)- 
(29]). It is readily verified that = 2^ for a layer ([22]), with 
i as defined in Case 1, and that N is divisible by 2^~^ in 
layers of type (i)-(ii), and by 2^ in layers of type (iii)-(v). By 
Corollary [3] the ipj are ordered by increasing weight order, 
and, since q < 1/2, the weight of any ipj is smaller than any 
weight in Hs- Thus, the Huffman procedure on Ss starts by 
pairing symbols in Lg. Now, it also follows from Corollary [3] 
that the merger of any two of the ipj results in a combined 
weight that is at least as large as any weight in the layer. 
Thus, merging ip2j-i with ?/^2j, 1 < j < N/2, is a valid 
sequence of steps in the Huffman procedure on L^. Moreover, 
since there is at most one symbol of weight different from 1 
or 2 (after scaling), and strictly between them, the resulting 
sequence of merged weights includes weights 2, uj, and 4, 
with 2 < < 4, with at most one symbol of weight uj. We 
iterate the argument until the signatures s—1 get incorporated, 
and Ijs-i gets formed (see Appendix [P]), reaching, thus, the 
reduced source Ss-i. Proceeding recursively, we reach the 
reduced source Sq, which coincides with the layer Lq. As 
described in ( [22] ) for s = 0, this layer consists of one virtual 
symbol formed by 1Zo and the symbol joined under the root 




/ 5 5 5 5 5 
Fig. 7. Top of the limit tree C-oo- 

of the tree C-k (thus, the Huffman procedure on <So is trivial 
in this case). ■ 

B. A limit code 

The sequence of optimal codes C-k stabilizes in the limit 
of /c ^ oo (g ^ 0), as stated in the following corollary. 

Corollary 4: When k^oo, the sequence of optimal trees 
C-k converges to a limit tree C-oo that can be constructed 
as follows: start with Qn for n=2, recursively replace the 
leftmost leaf of the deepest level of the current tree by Qn+i, 
and increase n. 

Proof: The corollary is proved by observing that the part 
of the tree corresponding to < s < 2^~^ in Theorem [6] 
remains invariant for all k' >k. This corresponds to the layers 
Lg of Case 1. ■ 

The limiting property of C-oo in connection with the TDGD 
is mentioned also in |JjJ Ch. 5]. Figure [7] shows the first 
fourteen levels of C-oo- Notice that the first eleven levels 
coincide with those of C_3 in Figure [6] up to reordering of 
nodes at each level. Explicit encoding with C-oo can be done 
as follows. Given a pair (i, j), with signature s = we 
write s = 2^ — 1 +r, with < r < 2^ — 1 and t > 0. We encode 
with a binary codeword xy, where x = i(^-i)(s+i)+2r+i 
identifies the path to the root of the quasi-uniform tree that 
contains all the leaves of signature s, and y = (55+2(^+1)- The 
resulting code length distribution for signature s is: 2^ — 1 — r 
signatures encoded with length (t — l)(s + 2) + 2r + 2, 2r + l 
signatures encoded with length {t — l)(s + 2) + 2r + 3. 

The following corollary shows the average code length 
attained by C-oo on an arbitrary TDGD. 

Corollary 5: The average code length of the limit code 
C-oo under TDGD(g) is given by 

1 



^q{C-oo) — 1 



l-q 



^/(2*(l-g) + 2). 



t>0 



Proof: For 5 > 0, let r and t, t > 0, < r < 2^ 
the (uniquely determined) integers such that s = 2^ - 



- 1, be 
1 + r. 
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By Corollary [4] and the ensuing discussion, we can write 



t>0 s=2*-l 



where 



s) = ( (t - + 2) + 2r + 2 ) (5 + 1) + 2r + 1 . 

Substituting r = s — 2^ + 1 and carrying out the inner 
summation in ([SSl, we obtain 



(34) 



t>0 



for some functions A{t) and B(t). It can be verified by 
symbolic manipulation that 

1 - + 2g 



5(0) 



and 



A(t - 1) + = q 



(l-q)3 

2* -2*9 + 2 



Substituting in (34), after rearranging terms, we obtain 



£,(C_oo) = (l-<z)2(^B(0)+|]/-i(A(t-l)+B(t)) j 



(1-9) 



2[l-q'^+2q ^ 2*2*-2*g+2 



(l-g)3 (l-q)3 



l + ^^/(2*(l-q) + 2). 



t>0 



VI. Practical considerations and redundancy 

In a practical situation, one could use the codes Ck for 
q > \, and the codes C-k for q < ^. However, a lower 
complexity alternative, which incurs a modest code length 
penalty (as shown in Figure [8]), is to use C-oo in lieu of 
the codes C-k, up to the value of q where switching to Ci 
gives better average code length. The crossover point is at 
q ^ 0.33715. 

Encoding a symbol pair (x, y) with a code C/c is of about the 
same complexity as two encodings of individual symbols with 
a Golomb code of order k. As described in Theorem [2) the 
encoding with Ck entails unary encodings of [x/k\ and [y/k\, 
which would also be needed with the Golomb code. Given the 
profile of the top code Tk = Tfj^^ck^ determined in Theorem |4j 
encoding with Tk requires comparing the index of the pair 
(x mod /c, y mod k) with at most two fixed thresholds, to 
determine the corresponding code length (which can assume 
up to three consecutive integer values). The codeword is then 
computed directly from the index. Each encoding with the 
Golomb code, on the other hand, requires one comparison 
with a fixed threshold to determine the code length of each 
Qk component, or a total of two for the pair (x^y). 

As in the one-dimensional case (see, e.g., Q, ifTSl ), when 
encoding a sequence xi, X2, . . . , X2tj . . the best code for the 



next pair (x2t-i,X2t) can be determined adaptively, driven 
by the sufficient statistic St = ^"^XljL^^^j- The crossover 
points for the estimates of the code parameter k can be 
precomputed and stored in terms of the statistic St. The one- 
dimensional code has a slight advantage in the adaptation, 
in that it can adapt its statistic with every symbol, whereas 
the two-dimensional code can only do it every two symbols. 
Depending on the application, this advantage is likely to be su- 
perseded by the redundancy advantage of the two-dimensional 
code. Also as in the one-dimensional case, there are certain 
complexity advantages, in both encoding and adaptation when 
using the subset of parameters of the form k = 2^. In this 
case, an adaptation strategy that estimates the best parameter 
r directly from the statistic 5*^, without the need to compare 
it with precomputed crossover points, can be derived for the 
codes Ck, as was done in O and ifTSll for two-sided geometric 
distributions. We omit the details, since both the technique and 
the resulting parameter estimation method are similar to those 
in the references. 

Figure [8] presents plots of redundancy for various code 
families as a function of q, measured in bits per integer symbol 
relative to the entropy of the geometric distribution (recall 
that the latter is given by H{q) = where h{q) is the 

binary entropy function |2|). Plots are shown for the optimal 
prefix code for each value of q (estimated numerically over a 
dense grid of values of q, and in sufficient precision to make 
the estimation error smaller than the plot resolution), the best 
Golomb code, the best code C-k or Ck for each q, and the 
limit code C-oo- Here, "the best Golomb code" means the 
code Gk that minimizes (over k) the code length for the given 
value of q; similar minimizations are used for the best codes 
C-k and Ck for each q. In the figure, we can observe the 
advantage in redundancy for the codes C-k (or C-oo) and 
Ck over Golomb codes, except in the region where the best 
codes of both types are equivalent (i.e., the optimality regions 
of Ci and C2). The redundancy advantage is near 2 : 1 (as 
expected) at the limit of q ^ and it peaks near g = 0.28 (at 
more than 13.6 : 1). A redundancy advantage close to 2 : 1 is 
observed also sls q ^ 1. The advantage of over symbol-by- 
symbol Golomb codes is consistent with Corollary [T] and, in 
fact, the plot in Figure [8] can be regarded as "visual evidence" 
for the corollary. Figure [9] plots the corresponding curves for 
the relative redundancy, i.e., the redundancy normalized by 
the per-symbol entropy H{q) for each plotted value of q. We 
observe that although the relative redundancy for all the codes 
considered converges to zero, as expected, when q ^ 1 (since 
H{q) 00), the decay is very slow for most of the interval, 
and the curves fall to zero "suddenly", with infinite slope, near 
q = 1. This is due to the slow rate of growth of H{q), which 
behaves asymptotically as — log(l — q) near the limit point. 

It is apparent from Figure [8] that as the redundancy of the 
codes Ck peaks in the transitions between one "best" value 
of k and the next, the estimated redundancy of the optimal 
codes remains rather flat. This poses the question, which 
also remains open, of whether other sequences of codes with 
simple descriptions and encoding/decoding procedures could 
be found, that would more closely track the redundancy curve 
of the optimal codes. 
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Fig. 8. Redundancy (in bits/integer symbol) for the optimal prefix code (estimated numerically), the best Golomb code, the limit code C-oo, and the best 
code or for each value of q, (A) < g < |, (B) | < g < 1. The limit code C-oo is plotted up to g = 0.33715 . . ., where its curve intersects 
that of Ci (or, equivalently, C_i). 



The asymptotic behavior of the redundancy of Ck in the redundancy per symbol, namely, R{k) = ^Cq{Ck) — H{q). 



regime q ^ 1, shown in more detail in Figure 10 is oscilla- 
tory, as is also the case for Golomb codes |2|. The limiting 
behavior of the redundancy can be characterized precisely, as 
we show next. 

Corollary 6: Let Xk = 2^//c^, where M is as defined in 
Theorem H] As /c ^ oo, the redundancy of the code C/c at 

q = 2-i/^is 



R{k) =^ (1 + log Xk) +2^-2^^ 1^1+ 



log e 



log(eloge) + o(l) . 



(35) 



Remark. We have | ~ A/^ ^ | , where ^ denotes inequality 
up to asymptotically negligible terms. For large k, sls k 
increases, Xk sweeps its range decreasing from | to |, at 
which point increases by one, and resets to |, starting 
a new cycle. 

Proof of Corollary |5[ We derive, from ( [2T] ), an asymp- 
totic expression for the code length Cq{Ck)- To estimate the 
parameter j in pT) , we need to solve the quadratic equation 
A(x) = 0, with A(x) as defined in Theorem |4] Writing 
2^ = A/c/c^, it is readil y verifie d that the largest solution to 
the equation is ^ = {2^Xk - ^ - l) ^ + 0(1) = a fc + 0(l). 
Thus, j = ak^ 0(1), and q^ = 2"" + 0{k-^). Writing also 
(7 = 2-1/^ = 1-^ + 0(^-2), and noting that A(j) = 0{k), 
we obtain, from ( [21] ), 

^g(Cfc) = M + 1 + 2^-^(1 + (l + a) ln2) +o(l). 

As for the entropy, we have 

-q log q 



log(l — q) = log(e log e) + log k + o(l) 



The claimed result p5\ follows by substituting the asymptotic 
expressions for Cq{Ck) and H{q) in the formula for the 



log(e log e) + - (M - log A^) + o(l) . 



The limits of oscillation of the function can be obtained 
by numerical computation, yielding Ri = liminffc^oo R{k) = 
0.014159... and R2 = Mmsupk^^ R{k) = 0.014583.... 



These limits are shown in Figure 10 The corresponding limits 
for the redundancy of the Golomb codes are, respectively, 
R[ = 0.025101. . . and = 0.032734. . . (1. 

Corollary |6] applies to the discrete sequence of redundancy 
values at the points q = . It is not difficult to prove 

that the same behavior, and in particular the limits R\ and 
R2, apply also to the continuous redundancy curve obtained 
when using the best code C/c at each arbitrary value of q. This 
follows from the readily verifiable fact that as q varies in the 
interval 2~^/^ < q < 2~^/^^+^\ the maximal variation in 
both the code length under Ck and the distribution entropy 
is bounded by 0{k~^). Figure 10 suggests that the same 



oscillatory behavior might apply also to the redundancy curve 
of the optimal prefix code for each value of g'. It follows from 
the foregoing discussion that this is true for the limit superior 
R2 . The question remains open, however, for the limit inferior 
Ri, which is an upper bound for the limit inferior of the 
optimal redundancy. 

Appendix A 
Proofs for Subsection IIV-BI 

We recall that we consider a 4-uniform probability distri- 
bution p = (pi,P2, • • • ,PAr), where probabilities are listed 
in non-increasing order, and an optimal tree T for p, with 
/t < 2. We define m = [logA^], and we denote by the 
number of leaves at depth £ in T. 

Proof of Lemma^ Say T has t > leaves at depths i < 
m—2. Then, T has no leaves at depths I' > m, and it can have 
a total of at most 2^"^ -3t leaves altogether. But TV > 2^"\ 
a contradiction. Say now that T has nodes at depth m+2. Then 
all of its leaves must be at depths £' > m, and some must be 
at depths strictly greater than m. Thus, T, being full, must 
have more than 2^ > N leaves, again a contradiction. The 
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Fig. 9. Relative redundancy (redundancy normalized by the per- symbol entropy) for the codes of Figure [8] The interval 0.5 < q < 0.75 is omitted from 
(B), as the best codes Ck and Gk coincide over that interval. 



■ Code Ck 
- Optimal code 




0.0140 



0.950 0.955 0.960 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000 



Fig. 10. Redundancy as g— )►! (k^oo). Dashed lines show the asymptotic 
limits Ri and R2 . The inset closes up further on a narrow segment, showing 
the redundancy of the codes vs. the asymptotic estimate {35). 



second claim of the lemma is a straightforward consequence 
of /t < 2. ■ 

Proof of Lemma^ Let Nt = (nM- 1, '^m, '^M+i) be the 
compact profile of a tree T with N leaves and /t < 2. Clearly, 
nM+i must be even, and we write riM+i = 2c for some 
nonnegative integer c. The components of Nt must satisfy 



(36) 



By Kraft's equality, which must hold for the full tree T, we 
have 

4nM-i + 2nM + 2c = 2^+^ , (37) 

which holds also in the case c = 0. From ([36]) and ([37]), we 
obtain 

UM-i = 2^-7V + c. (38) 
Now, from ([38l) and ([36l), we obtain 



riM 



27V -2 



M 



3c. 



(39) 



Equations ( [38] ) and ( [39] ), together with the definition of c yield 
the profile dS]). The valid range of variation of c is determined 



by the non-negativity constraints on the entries of the profile. 
When M = m - 1 (cr = 1), the lower limit = N - T^-^ 
is determined by the nonnegativity of um-i- Since 2^ > N 
when M = m, the lower limit is the trivial Cn = in this case. 



2N-2 



] is determined 



In both cases, the upper limit c^ = [ 
by the nonnegativity of um- ■ 
Proof of Lemma [§[ For a given value of cr G {0, 1}, 
assume c and c' are indices such that c^ < d < c <Crr, and 



let Scr be the segment of s corresponding to a. By (10) and 
the mono tonicity of the weights, we have 



Dcr.d = PN-2c' + 1 
< PN-2C+1 

Thus, if A 



-pN-2c'+2 — P2M-N+C' 
PN-2C+2 — P2M-N+C = ^o-,c • 

o-,c < then Dcr^c' < 0, and if D^j^c = then 



Da,c' < 0. It follows that s^- is non-decreasing. It remains to 
prove that —sg{Di^c^-\-i) ^ sg(Z^o,i)- Assume that Dq^i < 0. 
Then, we have 

^l,Ci + l P2^-N-i +P2--Ar -Pi> 2p2--Ar+i -Pi 

> 2 {pn-1 + Pn) -Pi> XpN -Pi > , 

where the equality follows from ( [TO] ) and the definition of c , 
the first and third inequalities from the monotonicity of p, 
the second inequality from our assumption on Dq^i, and the 
last inequality from the 4-uniformity of p. Hence, we must 
have Di^c^-\-i ^ 0. Similarly, if Dq^i < 0, then we must 
have l^i,c,+i > 0. Therefore, -sg (l^i,c,+i) < sg(i:>o,i), as 
claimed. ■ 
Proof of Theorem [Jf The theorem follows directly from 
Lemma [8j observing also that by the assumptions of the 
theorem, and by Lemma [7] at least one of the trees Tcr,c , 
(l,ci) ^ (cr, c) ^ (0,co) must be optimal for p. ■ 

Appendix B 
Proofs for Subsection IIV-CI 

We derive the proof of Theorem [4] through a series of 
lemmas. We recall that we seek an optimal tree for the source 



17 



of ([t]), with vector of (unnormalized) weights 

with q = 2~^/^, and where is repeated j + 1 times for 
< j < k-1, and 2k - 1 - j times for k < j < 2k-2. 
For succinctness, in this appendix, when we say "optimal" we 
mean "optimal for Ak-' Notice that, in p, three consecutive 
weights are never distinct; we refer to this fact as the "three 
consecutive weights" property. Throughout the appendix, we 
assume that /c > 2, as we recall that optimal trees for /c = 1, 2 
are fully characterized in Remark [2] following Theorem [2] 

Lemma 12: Trees T^r^c with c = c^r are not optimal. 
Consequently, the profile (nM-i, ^m, ^m+i) of an optimal 
tree has um > 3. 

Proof: Recalling the profile N^^^ in ([8j), with c = c^r and 
/c > 2, we have um ^ {0, 1,2}, um-i ^ 1 and um+i > 2. 
Let q^ be the lightest weight on level M — 1. By the "three 
consecutive weights" property, the two heaviest weights on 
level M + 1 are greater than or equal to q^^'^. Recalling the 



It follows from Lemma 13 that in an optimal tree, the 
heaviest weight on level M is covered by ([12]) in Lemma [9] 
(and, thus, so is any weight on level M — 1), while the lightest 
weight on level M is covered by ([13]) in that lemma (and, thus, 
so is any weight on level M + 1). Consequently, an optimal 
tree is completely determined by a tuple j = (j, r,/, r'), with 
< j,/ < A: - 1, < r < j, and < < /. The profile of 
the tree is then given by 



riM-i 
riM 



2 

/(/ + 1) 



(40) 

(41) 
(42) 



The following lemma presents a characterization of the least 
value of c for which T^r^c is optimal. The lemma follows 



immediately from Theorem [3] and Lemma 10 



Lemma 14: Let be the least value of c such that T^^c 
is optimal. Then, either D^j^c +i ^ (with = c^), or 



Da^ck < and D^^c^+i > (with Ck > c^). 



F{j, r, /, rO = 2e - 2^+^ + j (j + 1) + 2r - 



expression for D^^ c in ( 10), and the interpretation that follows 

it, we obtain D^^c^ > q^l -2q^) > 0. Thus, by Theorem |3j Define the function 
Tc^ is not optimal. An optimal tree Tcr,c would, therefore, have 
c < Ccr, and, thus, um > 3. ■ 
The following lemma gives a first, rough approximation of 
the distribution of weights by levels in an optimal tree Tcr,c, 
which will allow us to identify the appropriate range (i.e., ( [T2| ) 
or ( [T3] )) for the heaviest and the lightest weights on level M 
of the tree. 

Lemma 13: Let T^r^c be an optimal tree, and let q^ and 
denote, respectively, the heaviest and the lightest 



/(/ + 1) 



(43) 

acting on tuples j = {j^r^j'^r') for a given value of k. Next, 
we derive a set of conditions on the tuple j corresponding to 



the tree T^j^ck characterized in Lemma [14 



weights on level M of the tree. Then, we have j < k — 1, 
j' <k- 1, and j + / < k. 

Proof: Consider first the case where c > c^, i.e., all 
the components of the profile N^^^ are positive. The lightest 
weight on level M — 1 of the tree immediately precedes q^ in 
p. Hence, it is of the form q^~^ , with e G {0, 1}. On the other 
hand, reasoning similarly, the heaviest two weights on level 
M+1 are of the form g2/e-2-j'+£' ^2k-2-j'+e'+e" ^ where 

e' ^e" G {0, 1} and e' -^e" <1 (due to the "three consecutive 
weights" property). Since T^^c is optimal, by the definition of 
Dfj^c in ([9]), we must have D^j^c ^ 0. Applying ([TO] ), the above 
constraints on e^e' ^e" , and the fact that q^ = iTwe get 



Lemma 15: Let j = {j^r^j'^r') be the tuple defining the 
profile of T^^ck in (|40|)-(|l2]). Then, 

F{j,rj\r') = 0, (44) 

and exactly one of the following conditions holds: 

(i) jj' > 0, j ^ f = k - 2. Either r = and < < /, 
or 1 < r < j and r' G {0, 1}. 

(ii) j,f > 0, j +/ = A: - 1, r = and / G {0, 1}. 

(iii) / = 0, = 0, j G {/c - 2, /c - 1}, < r < j. 

(iv) j = 0, r = 0, / G {A; - 2, A; - 1}, < < /. 

Conversely, if j = {j^T^j' ^r') satisfies ( [44| ) and one of the 
conditions (i)-(iv), then j defines Tfj^ck • 



> D,, 



^j-e ^ ^2k-2-j'+e' ^ ^2k-2-j'+e'+e" 



Proof: The necessit y of (44 ) follows from the definition of 
F{j^r^ j' ^r') and from (38), setting c = ^um+i, substituting 
the expressions from (|40|) and (|4T]) for um-i and um+i, 



> 



Thus, j+/ < k. Since both j and / are positive when c > c^, 
the claim of the lemma follows in this case. 

Consider now the case where c = c^, i.e., T^r^c is a quasi- 
uniform tree. If a = 0, we have um+i = 0, and, thus, the 
lightest weight on level M is Pk^ = q^^~'^, and / = 0. For 
the heaviest weight on level M, we have p2^-/c2+i = 0.-^ • 
By ([l4|), we have 2'^-A:^ + l < k(k^l)/2. Recalling the order 
and structure of p, we obtain q^ = p2^-k'^+i ^ Pfc(fc+i)/2 = 
q^~^ . Thus, 7 < k — 1. The case of c = and a = 1 



respectively, and rearranging terms. In fact, ( [441 ) niust hold for 
any optimal tree, not just for c = c^. Conditions (i)-(iv) will 
follow from an exhaustive case study of configurations that 
yield the inequalities on the quantities 
the point c = Ck, as stated in Lemma [14 



that characterize 



Consider, first, the case where Ck > c^. Then, for c = Ck, 
by Lemma [T4| we have Dcr^c < and I^cr,c+i ^ 0. Writing 



down the expressions for D^^c and I^cr,c+i explicitly according 



is argued similarly, using (16) in lieu of (14), and leading to 
j = and j' <k-l. ■ 



to ([T0|, we observe that six weights are involved, as illustrated 

^a,c to a 

to p2^-/c2+c+i. or an increase from Pk'^-2c+i +Pk^-2c+2 to 
Pk'2-2c-i +Pfe2-2c. or both. By the definitions of j and /, we 



in Figure 11 In order to switch from a negative Da 
nonnegative I)cr,c+i. we must have a decrease from P2M_ 



have P2^-/c2+c+i = and Pk2-2c 



^2k-2- 



. Taking into 
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decrease increase 



— e • — — • • e e — 

• • • P2^-/c2+c P2^-/c2+c+l ••• P/c2-2c-l Pfc2_2c P/e2_2c+l P/e2_2c+2 ••• 
qj-£ qj q2k-2-j'-e' q2k-2-j' ^2k-2-j'+e"q2k-2-j'+e"+e"' 

Fig. 11. Weights involved in the conditions for c = c^: O weights in Da,c , ^ weights in Dcr,c+i • 



account that consecutive weights can vary at most by a factor 
of q, we can write, for the other weights involved, 



P2^-/c2+c 
Pk^-2c-l 
Pfe2-2c+l 
P/c2-2c+2 



2k-2-j'-s' 
d •) 

2k-2-j'+E" 
d 1 

2k-2-j' +£"+£"' 



(45) 
(46) 
(47) 
(48) 



G {0, 1}, and, due to the "three consecutive 
weights" property, we must have e' + e" < 1 and s" + s"' < 1. 
Table |V| summarizes the patterns of values of e = (e, e[e''e"') 
that satisfy these constraints and also produce the combination 
of weight increases or decreases necessary to satisfy the 
conditions for c = Ck. On the right column of the table, we list 
the conditions imposed on j by the constraints of each case. 
To illustrate the proof approach, we derive these conditions, 
below, for the representative case e = (1,0,0,1). The other 
cases follow using similar arguments, which are also similar to 



those used in the proof of Lemma [13] (here, more parameters 
are assumed known, which allows us to obtain tighter bounds). 

Assume e = (1,0,0,1). Then, writing the conditions on 
I)cr,c and I^cr,c+i at c = c/e explicitly, substituting for the 
weights using the known values in e, and recalling that = ^, 
we obtain 

> D^^c =P/e2_2c+l +P/c2-2c+2 -P2^-/c2+c 

> 2q^^-^-^' 



and 



< ^a,c+l =Pfe2-2, 



^2k-2- 



q 

2q 



2k-2-j' 



,_1 +P/c2_ 



■P2^-/c2+c+l 



-2-/ 



It follows that k — 2 < j -\- j' < /c — 1, as claimed in the 
second row of Table |vl The conditions on r and follow 
from Lemma [9j observing that r resets to zero at points where 
j increases, and similarly with relative to j\ In this case, 
P2^-/c2+c is the last weight of the form q^~^, and, thus, we 



have um- 



c = j(j + l)/2 and r = 0; scanning 



p from right to left, Pk'^-2c+2 is the last weight of the form 



TABLE V 

The possible cases for e = {e, e', e'Ie'") from ([45|)-([48), and the 

CONDITIONS IMPOSED ON (j,r,j',r') AT C = Cfc. 





Conditions on (j, r, r) 


(1,0,0,0) 


j + / = /c - 2, r - 0, 2 < / < / - 1 


(1,0,0,1) 


j +fe{k-2,k- 1}, r = 0, / = 1 


(1,0,1,0) 


j +fe{k-2,k- 1}, r = 0, / = 


(0,0,0,1) 


j + / = /c - 2, 1 < r < j, / = 1 


(0,0,1,0) 


j + / = /c - 2, 1 < r < j, / = 


(1,1,0,0) 


j -\- j' — k — 2, r — r' — j' 


(1,1,0,1) 


j +fe{k-l,k- 2}, r = 0, / = / = 1 


(0,1,0,0) 


case cannot occur at c = Cfc 


(0,1,0,1) 


j + / = /c - 2, 1 < r < j, / = / 1 



Consider now the case where Ck = c^. In this case, the 
tree is quasi-uniform. When = 0, since nM+i = 0, we 
have / = = 0. The condition j < k — 1 was established 
in Lemma 13 while the condition j > k — 2 follows directly 
from I^cr,c^+i = ^cr,i ^ 0- Thus, Condition (iii) of the lemma 
is satisfied in this case. Similarly, when Ck = and a/e = 1, 
we have j = r = 0, j^<k — 1 was established in Lemma 13 
and j' >k — 2 follows from D^ c^ ^ 0- Thus, Condition (iv) 
of the lemma is satisfied in this case. 

To prove the sufficiency of the conditions of the lemma, 
we first claim that, with j satisfying the conditions, the profile 



N = (nM-i, ^M, ^M+i) defined in (40)-(42) defines a valid 
tree. Clearly, um-i and um+i are non-negative. To verify that 
um is also non-negative, we write 



< 



2 

(J+/ 



1)^ 



2 



> 



q2k-l-j' 

and r' = 1. 

It is readily verified that all the cases on the right column 
of Table [V] satisfy either Condition (i) or Condition (ii) of the 
lemma. 



where the inequality follows from the fact that (a+6+1)^ 
a(a+l)+6(6+l) for a, 6 > 0, and from the inequalities r < 
j and r' < j' . With j+/ < k—1, it follows that um-i + 
^M+i < k — 1 ^ k'^ /2 < k'^ . Hence, um, as defined in (42), 
is positive. On the other hand, ( [44| ), together with the fact that 
the components of N add up to /c^, is equivalent to the Kraft 
equality for N. Therefore, N defines a valid tree T^j^c It is 
readily verified that if either Condition (i) or (ii) is satisfied, 
then the parameters (cr, c) of Tcr,c satisfy c > c^, D^^c < 

" Ck- 



and, thus, we have um+i = 2c = /(/ + l)/2 + 1, 0, and I^cr,c+i < 0- Thus, by Lemma [8j we have c 

Similarly, if either Condition (iii) or (iv) is satisfied, we have 
c = c^, D^^c^+i > 0, and, again, c = c^. ■ 
The following lemma explores some properties of the func- 



tion A{x) defined in (17). 
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Lemma 16: (i) For any x, we have A(x+1) = A(x)+x+/c . 

(ii) We have A(-l) < and A(fc) > 0. Thus, xq, the 
largest real root of A, satisfies —l<XQ<k. 

(iii) The values A(/c — 1) and A(/c — 2) are even integers. 
Proof: (i) The claim is readily verified by direct applica- 
tion of ( pTl ). 



1) < 



(ii) Setting x = —1 in (17), and recalling that Q = k'^ 

\k{k-l)/4:] and M = [logQl, we obtain 

k{k- 



= 2{Q 



2^) 



2^ik mod 4)G{2,3}) 
l(/c mod 4)G{2,3} + ^(Q — 2^), 



where = 1 if the predicate V is true, or = 
otherwise. It follows that A(— 1) can be positive only if 
(A: mod 4) G {2,3} and Q = 2^. Writing Q = Q{k), 
and computing explicitly Q{4£ + 2) = {4£ + 3){3£ + 1) and 
Q{4£ + 3) = + 1){12£ + 7), we conclude that Q has at 
least one odd divisor when {k mod 4) G {2, 3}. Therefore, 
we must have A(— 1) < 0. 

Furthermore, since Q < 2^ <2(5 — 1, we have 



A{k) = 2^ 
> 2k^ 



■2^+^ ^k{k^l)-l 
4Q + + 1) + 1 
'k(k-l) 



1) 



= -2A:^+4 ^' ^k(k- 

4 ^ 

2k^ + k(k - 1) + + 1) + 1 = 1. 



> 



Thus, A(A:) > 0, and, since the coefficient of x^ in A(x) is 

\, xo must be in the claimed range. 

(iii) By direct computation, we have A(/c — 1) = 2A:^ — 
2^+i + (i^-l)i^and A(i^-2) = 2fe2_2M+i^(^_2)(^_l), 

Since /c > 2 and M > 0, both values are even. ■ 
To complete the proof of Theorem |4j we will construct a 
tuple j = (j, r, /, r') that satisfies the conditions of Lemma 15 
and, thus, defines the sought parameter pair (cr/c,c/e). 

Proof of Theore m [?| It follows immediately from the 
definition of A(x) in ( 17 ) and of F(j, r, /, r') in (43 ) that for 
j, r, j', t' we have 



= A(j) + 
When 2' — k 



{k-j-2){k-j-l) + 



2 2 
j — 2, this reduces to 

i^(j>,/,rO = A(j) + 2r-/, 

while with j' = k — 1 — j get 

/, rO = A(j) + 2r - / - (fc - j 



2r — r 



(49) 



1). 



(50) 



We will use these relations to verify that the solutions con- 
structed below satisfy ( [44| ). Let xq be the largest real root 
of A(x), and let ^ = [xqJ. By Lemma 1611), we have 
-1 < ^ < A:, A(0 < 0, and A(^ + 1) > 0. We consider 
three main cases for A(^), and for each case (and possible 
sub-cases) we define a tuple j = (j, r,/, r') and verify that it 
satisfies the conditions of Lemma [15] 



2) 



3) 



-A(0 
-A(j) 



< 2^: Let j = ^ 



-A(i) + 1 



and 



b) 



c) 



r = —1^{J) mod 2. By the assumptions of the case on 
A(^), we have j > 0. As for /, we have the sub-cases 
below. At the end of each sub-case, we note which of 
Conditions (i)-(iv) of Lemma [15] is satisfied. 

a) j = : We must have A(0) = 0, so we get r = 
= 0, and we set / = /c — 2 (Condition (iv)). 
j e {k -2,k By Lemma |l6];iii), A(j) is 

even, and r' = 0. We get r = — ^^=^^~and < r < j 
by the assumptions on A(^), and we set / = 
(Condition (iii)). 

< j < k — 2: Set / = k—2—j. From the choices 
for r and r^ we get < r < j and < < 1 < / 
(Condition (i)). 

To verify that ( [44] ) is satisfied, we apply (49) for sub- 
cases a) and c), and for sub-case b) with j = k — 2. We 
apply ([50]) for sub-case b) with j = k — 1. For example, 



for sub-case c), by (49) and the definitions of r and r', 
we have, 

F{j,rj\r') = A{j)^2r-r' 

Verification of F = for the other sub-cases follows 
along similar lines. 

-A(0 G {2^ + 1, 2^ + 2} : Let j = ^ + 1. By 
Lemma [T6|ii), we have < j < k. We claim that 
j < k — 1. Assume, contrary to the claim, that j = k. 
Then, -A{k - 1) = -A(0 = 2k - s with e G {0, 1}, 
and, by Lemma p^i), we have A(^ + 1) = A{k) = 
A{k -1) + 2A:-1 = e - 1 < 0, contradicting 
Lemma 16 li), which establishes A(^ + 1) > 0. Thus, 
we have < j < k — 1, and, defining j' = k — 1 — j, 
we also have < < /c — 1. By Lemma [T6]^i), we have 
A(j) = A(^ + l) = A(^)+^ + A:, and, by the conditions 
of the case on A(^), we get A{j) G {k — j^k — j — 1}. 
Define r = 0, and r' = A{j) — {k — j — 1), which 
implies r' G {0, 1}. Thus, whenever < j < k — 1, 



j = (j^^^/^^O satisfies Condition (ii) of Lemma 15 
When j = 0, j satisfies Condition (iv), and when 
j = k — 1, it satisfies Condition (iii) as long as = 0. 
We claim that when = 1, we must have j < k — 1. 
Otherwise, if = 1 and j = k—1, then, by the definition 
of r', we have A{k-1) = A{j) = r' + (fc - j - 1) = 1, 
contradicting Lemma [T6|iii). Thus, j satisfies one of 
the conditions (ii)-(iv) of Lemma [15] By ( [5Q| and the 
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ii), 



definitions of r and r\ j also satisfies ( [44] ). 
- A(0 > 2^ + 3 : Let j = ^ + 1. By Lemma 
we have < j < k. We claim that j < k — 2. Assume, 
contrary to the claim, that j = k — 1. Then, ^ = k — 2, 
and, by the assumptions of the case, we have —A{k — 
2) >2{k-2)^3 = 2k- 1. Applying Lemma[l6];i), we 
get A(^ + 1) = A{k - 1) = A(/c - 2) + (fc - 2) + = 
A{k — 2) + 2/c — 2 < —1, contradicting Lemma 1611), 
since we must have A(^ + 1) > 0. Similarly, if j = k, 
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then -A{k - 1) > 2A: + 1 and A{k) = A{k -l)^2k- 
1 < — 2, again contradicting Lemma [T6jii). Thus, we 
have < j < k — 2, and we can define j' = k — 2 — j, 
which also satisfies </ < k — 2. By Lemma p^i), 
and the conditions of the case on A(^), we have A(j) = 
+ 1) = A(0 -^i^k <k-i-'i = k-2- 
j = j' . Define r = 0, and r' = A(j), satisfying < 
r' < j' . Thus, j = {j-,r^j'^r') satisfies Condition (i) of 



m, n > 0, the average code length under Ck is 



0^*)i<fc m,n>0 
k\2 



Lemma 15 By (49) and the definitions of r and 



also satisfies ( |44| K 



2 



0<i,j</c-l 



(51) 



Cases [TJ|3] above cover all possible values of A(^), and in 
all cases, we have exhibited an explicit tuple j = (j, r,/,r') 



satisfying the conditions of Lemma 15 and, therefore, defining 
the optimal tree T^^^ck- It can readily be verified that the 
definitions of j and r in ([18]) summarize the corresponding 
definitions in the cases of the proof, with the top branch of ([18]) 
corresponding to Case [T] and the bottom branch to Cases|2] 
and [3] Furthermore, the definition of in ([T9] ) reflects the 
parameter c = nM-i — 2^ -\-k'^ in the profile (40)-(42) defined 
by j for c = c^. ■ 
Proof of Corollary |7]* By the structure of Ck in The- 
orem [2] it suffices to prove that Qk ' Qk not optimal for 
the finite source Ak- Let h = [log/c] and a = 2^ — k, with 
< a < 2^~^. From the profile of Qk given in in Section 
one derives the profile of Qk ■ Qk, obtaining 



where the second equality follows from elementary series 
computations, and the third identifies the (normalized) average 
code length of the code defined in Theorem |4] Denote 
by Wm-i^Wm, and Wm+i the total normalized weight of 
symbols in Ak assigned length M — 1, M, and M + 1, 
respectively, by T^. Then, the average code length of is 
given by 



Cqin) = (M - 1) Wm-1 + M T^m + (M + 1)Wm+i 



M^Wm+i-Wm-i- 



(52) 



n-B 



N 



Qk-Qk 



= (^2/1-2, ^2/1-1, n2h)= 2a{k-a), {k-af) . 



Since Qk-Qk has fringe thickness /t < 2, it has a repre- 
sentation Tfjg^cg, for some parameters a^, c^, as defined in 
Lemma [7] with N = k'^. The case a = (i.e., A: 2^) is 
readily discarded as sub-optimal for A: > 2, as it corresponds 
to a uniform tree with 2^^ leaves, which cannot be optimal 
for Ak since Pk^ + Pk'^-i < Pi for that source. Also, we 
can assume that ag is such that Lemma 10 is satisfied, 
and that n2h-2 and n2h are such that they can be written, 
respectively, as um-i and nM+i in 
/ satisfying Lemma [l3 



From the profile ([sk with N = k'^ and c = c/^ as defined 
in ([19]), recalling letting 7 = (1 - g')2/(l - g^)^, and 
carrying out the computations, we obtain 

j(j + l)/2+r j-1 

^ 1 - (1 + (1 - q)j - (1 - g) V) 
(l_^fe)2 

Similarly, from the proof of Theorem |4j setting / = k — j — 2 
and r' = 2r + A(j), we obtain 

j'(j' + l)/2+r'-l 

Wm+1 =7 X] P/c2-i 



Otherwise, T^^ 



(41 ), with j and ^ , , / 

ii not optimal, _ q''+q'+^ ({k-j-m-q)q - q + il-q)^2r-Aij) 



7eV+1) 

£=0 



i=0 



^2k-2-i +^/g2/c-2-i' 



and the corollary is proved. By Lemma |9| we can write 

a'^ < ^(j + l)(j + 2) < |(j + 2)2, or j > V2a-2. Similarly, 
we have {k — a 
f>V2{k-a) 



? < ^(/ + l)(/ + 2) < ^(/ + 2)2 



or 

2. Adding up, we obtain j + / > \f2k — 4, 
and, hence, for /c > 10, j + / > /c, contradicting Lemma j3 
For the remaining cases, if k G {7, 9} one verifies that Og 
violates Lemma 10 and for k G {3,5,6}, one can easily 
verify, by direct inspection, that T^r , c is sub-optimal for Ak • 



Appendix C 
Proofs for Subsection IIV-DI 



Proof of Corollary^ By Theorem [2] the code length for 
(a, b) under Ck is \Tk{a mod k, b mod + 2 + [f J + [|J . 
Writing a = mk + i and b = nk -\- j with < i^j < k, 



The result ( |2Q| now follows by substituting the above expres- 
sions for Wm-1 and Wm+i in (52), substituting for Cq(Tk) 
in ( [5T] ), and using appropriate algebraic simplifications. The 
result (21 ), in turn, follows by applying the relation q^ = 1/2. 



Appendix D 
Layer transitions in the codes C-k 

In each layer transition described below, we assume that we 
start from a layer Lg of type (x), and show how it unfolds into 
a layer L5+1 of type (y), the transition being denoted (x)^(y). 
We denote by dg the depth of the shallowest node in L5. 
(i)^(i): The tree q^^^Vk in each of the i groups Al in L5 
unfolds, by the definition of Vk (see also Figure [ 
into a tree q^^^Vk and 2^ — 1 leaves of weight q^ 
which provides a group M for L5+1. Hence, there 
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are i groups Ai in L^+i, which include (2^ — 1)£ 
signatures s + 1. This propagation of groups Ai will 
occur in the same way in all the other transitions 
below; its discussion will be omitted for those cases. 
There remain s + 2 - (2^ - 1)^ = 2^'^ + 1 + j 
signatures 5 + 1, with 0<j<2^~^— 4 (recall 
that layers of type (i) exist only if /c > 2). A quasi- 
uniform tree with 2^~^ + 2 + j leaves is built, rooted 
at 7ls. This tree has 2^~^ — (j + 1) — 1 leaves at 
depth k — 1, which are labeled 5 + 1, and 2(j + 1) + 2 
leaves at depth k, of which 2(j + 1) + 1 are assigned 
label 5 + 1, and one serves as the root of 7^s+i, 
consistent with a structure of type (i) for 5 + 1 (and, 
correspondingly, j + 1). 

(i) ^(ii): We have j = 2^-^ - 3. We let lis be the root of a 

balanced tree of height k. Of its 2^ leaves, 2^ — 2 are 
assigned the remaining 2^ — 2 signatures 5 + 1, one 
leaf serves as the root for qUk-i, and the remaining 
leaf as the root for 7^s+i. 

(ii) ^(iii) (k>2): The tree qUk-i in Lg contributes 2^~^ leaves 

of signature 5 + 1 to L^+i, in addition to those 
contributed by the groups Al. There remain 2^~^ — 1 
signatures 5 + 1, which are assigned to leaves of a 
balanced tree Uk-i rooted at IZs. The remaining leaf 
splits into two nodes, one is the root of a tree qUk-i, 
and the other anchors 7^s+i. 

(ii) ^(iv) (k=2): The tree qUi in contributes 2^ leaves of 

signature 5 + 1 to Lg+i, in addition to those con- 
tributed by the groups Ai. The remaining signature 
5 + 1 is assigned to one leaf of a tree Ui rooted at 
IZs. The second leaf splits into two nodes, one is the 
root of a tree qVj^, and the other anchors 7^s+i. 

(iii) ^(iii): The construction from the previous transition is 

kept, except that one of the leaves of the tree Uk-i 
rooted at IZs is split, making room for the additional 
signature 5 + 1 resulting from the increase in s. 
Hence, there is a decrease by one in the number 
of leaves at depth ds and an increase by two in 
the number of leaves at depth ds -\- I. This process 
continues until j = 2^ — 4. 

(iii) ^(iv): This transition is identical to the previous one, 

except that instead of a tree qUk-i, a tree gV^ is 
attached as sibling to 7^s+i. 

(iv) ^(v): The tree qV^ from the previous transition provides 

the 2^~^ — 1 leaves of signature 5 + 1, plus a tree 
qVk . What started as a balanced tree of depth /c — 1 in 
the transition (ii)^(iii) has evolved into a balanced 
tree of depth k, with all leaves assigned signatures 
5 + 1, except for one, which serves as the root of 

(v) ^(i) (k>2): The tree qVk added in the previous transition 

generates a new group A4, consistent with the in- 
crement in £. All signatures 5 + 1 now originate 
from the groups Jid, or from IZs, which brings the 
construction back to a layer of type (i), completing 
the cycle. 

(v)^(ii) (k=2): When A: = 2 the transition occurs to a layer of 
type (ii), as described above for the initial transition 



from Case 1 to Case 2. 
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